Partitioning Program into Hardware and Software

Shengchao Qin*
Department of Informational Sciences
School of Mathematical Sciences
Peking University, Beijing, 100871
qinshc@pubms.pku.edu.cn

Jifeng He†
International Institute for Software Technology
The United Nations University
P.O.Box 3058, Macau
jifeng@iist.unu.edu

Abstract

Hardware and software co-design is a design technique which delivers computer systems comprising hardware and software components. A critical phase of co-design process is to decompose a program into hardware and software. This paper proposes an algebraic partitioning method whose correctness is verified in the algebra of programs. We introduce the program analysis phase before program partitioning and develop a collection of syntax-based splitting rules, where the former provides the information for moving operations from software to hardware and reducing the interaction between components, and the latter supports a compositional approach to the program partitioning.

1. Introduction

The design of a complex software product like a nuclear reactor control system is ideally decomposed into a progression of related phases. It starts with an investigation of the properties and behaviours of the process evolving within its environment, and an analysis of requirement for its safety performance. From these is derived a specification of the electronic or program-centered components of the system. The project then may go through a series of design phases, ending in a program expressed in a high level language. After translation into a machine code of the chosen computer, it is executed at high speed by electronic circuitry. In order to achieve the time performance required by the customer, additional application-specific hardware devices may be needed to embed the computer into the system which it controls.

With chip size reaching one million transistors, the complexity of VLSI algorithms is approaching that of software algorithms. However, the design methods for circuits resemble the low level machine language programming methods. Selecting individual gates and registers in a circuit like selecting individual machine instruction in a program. State transition diagrams are like flowcharts. These methods may have been adequate for small circuit design when they were introduced, but they are not adequate for circuits that perform complicated algorithms. Industry interest in the formal verification of embedded systems is gaining ground since an error in a widely used hardware device can have significant repercussions on the stock value of the company concerned. In principle, proof of correctness of a digital device can always be achieved by making a comparison of the behavioral description of the circuit with its specification. But for a large system this would be impossibly laborious. What we need is a useful collection of proven equations and other theorems, which can be used to calculate, manipulate and transform the specification formulated for the product.

Hardware/software co-design is a design technique which delivers computer systems comprising hardware and software components. A critical phase of co-design process is to partition a program into hardware and software. This paper proposes a partitioning method whose correctness is verified using the algebraic laws developed for the high level programming language. To meet performance goals, and reduce the communication between components, our approach combines the program analysis technique with the syntax-based splitting rules to move heavy-weight operations from software to hardware. The allocation of variables is also based on the data flow analysis of the source program. One of the advantages of our method is the integration of the splitting phase with the joining phase of the partitioning process. It optimizes the underlying target architecture, and facilitates the reuse of hardware devices.

The algebraic approach advocated in this paper to verify the correctness of the partitioning process has been successfully employed in the ProCoS project on "Provably Correct Systems". The original ProCoS project formulated almost exclusively on the verification of standard compiler
of a high-level programming language based on Occam down to a microprocessor based on Transputer [5]. Sampio showed how to reduce the compiler design task to one of program transformation; his formal framework is also a procedural language and its algebraic laws [14]. Towards the end of the first phase of the project, Ian Page et al made rapid advance in the development of hardware compilation technique using an Occam-like language targeted towards Field Programmable Gate Arrays [11], and He Jifeng et al provided a formal verification of the hardware compilation scheme within the algebra of Occam programs [4].

Recently, some works have suggested the use of formal methods for the partitioning process [1, 2, 15]. Balboni et al adopt Occam as an internal model for the system exploration and partitioning strategy. Cheung pursues the structural transformation and verification within the functional programming framework. However, neither has provided a formal proof for the correctness of the partitioning process. In [15], Silva et al provide a formal strategy for carrying out the splitting phase automatically, and presents an algebraic proof for its correctness. However, the splitting phase delivers a large number of simple processes, and leaves the hard task of clustering these processes into hardware and software components to the clustering phase and the joining phase. Furthermore, additional channels and local variables introduced in the splitting phase to accommodate huge number of parallel processes actually increase the data flow between the hardware and software components.

The remainder of this paper is organized as follows. Section 2 describes the splitting strategy. Section 3 introduces the programming language we adopt and explores its algebraic laws. Section 4 poses the static analysis that we perform on the source program. Section 5 investigates the underlying target architecture of hardware/software components. Section 6 provides the syntax-based hardware/software splitting rules in both bottom-up and top-down styles.

2. Splitting Strategy

This section describes our partitioning strategy. A sequential source program of a communication language is generated from the customer’s requirements. A static analysis [10] is performed on the source program in order to provide to the programmer statistical data, such as structural complexities of expressions and their occurrence frequencies, distributive information with respect to the programming language occurring in expressions. Based on the result of the analysis, the programmer marks those parts of the program that are worth to be implemented by hardware and leaves others to software, and as well divides the interface of the program to two disjoint parts.

The implementation-oriented program marking and interface (variable) partitioning are conducted by the following guidelines:

- For the concern of security or other special reasons, some specific blocks will be predetermined to be implemented by hardware or software.
- In general, those procedures which are frequently invoked and those specific blocks that occurs frequently should be marked out to be implemented by hardware, to gain high performances.
- Some procedures/blocks involving very complicated computation (e.g., containing intricate expressions) should be marked and implemented by hardware, to improve timing performance.
- Busy variables should be allocated to hardware, to make high-speed access available, whereas the remaining variables and large scale data structures, such as large arrays, should be left to software, to achieve lower costs.
- The number of interactions between software and hardware should be minimized since they incur high costs.
- In addition, the customer’s demands concerned with the performance and the cost should also be taken into account.

We take such a marked source program as input of our hardware/software splitting algorithm that generates as output a program comprising two concurrent processes representing software and hardware components respectively.

3. Preliminaries

The language we select to perform hardware/software partitioning is a subset of Occam which was designed for constructing communicating systems.

1. Sequential Process:

\[ S ::= PC \mid S \cdot S \mid if \ b \ S \ else \ S \mid S \cap S \mid b \cdot S \mid (g \ S) \mid declaration \cdot S \]

where 

\[ PC ::= (x := e) \mid skip \mid chaos \mid c!e \mid c?x \mid proc(e, v) \mid (S) \]

and \( g \) is skip or a communication event \( c!e \) or \( d?x \).

2. Parallel Program:

\[ P ::= S \mid P \parallel P \mid declaration \cdot P \]

where 

\[ declaration ::= var.dec \mid chan.dec \mid proc.dec \]

\[ var.dec ::= var \cdot type(v) \]
\[\text{chan}_\text{dec} ::= \text{chan} \ e : \text{type}(e)\]

\[\text{proc}_\text{dec} ::=\]

\[\begin{align*}
\text{procedure} & \quad \text{proc}(\text{in} \ u : \text{type}(u), \ \text{out} \ v : \text{type}(v)) \\
\text{begin} & \quad S \\
\text{end} \]
\end{align*}\]

In the later discussion, we adopt \(\text{Var}(P)\) and \(\text{Chan}(P)\) to denote the set of variables and channels employed by \(P\). Moreover, we will not mention the type information of a variable in a declaration if it is obvious.

As a subset of Occam, the language enjoys a rich set of algebraic laws presented in \([13, 3, 7, 9, 8]\). Here we only explore those algebraic laws which will be employed within the proofs in the following sections.

Successive assignments to the same variable can be combined to one assignment.

L.1 \[x := e; \ z := f = x := f[e/x]\]

Sequential composition is associative, and has left zero \(\text{chaos}\) and unit \(\text{skip}\). It distributes backward over internal and external choices and conditional.

L.2 \[(P, Q); R = P; (Q; R)\]

L.3 \[\text{chaos}; P = \text{chaos}\]

L.4 \[\text{skip}; P = P; \text{skip} = P\]

L.5 \[(P \land Q); R = (P; R) \land (Q; R)\]

L.6 \[(g P)(h Q); R = (g (P; R))\land (h (Q; R))\]

L.7 \[(if \ b \ P \land e \ Q); R = if \ b \ (P; R) \land e \ (Q; R)\]

Assignment distributes forward over conditional.

L.8 \[v := e; if (b(v) \land e \ Q); R = if \ b \ (e; v := e; P) \land e \ (v := e; Q)\]

The input and output event can be renamed as follows.

L.9 \[e \ ? x = \text{var} \ x \bullet (e \ ? x; x := ix)\]

L.10 \[c ! e = \text{var} \ x \bullet (ix := e; c ! x)\]

Iteration is subject to the fixed point theorem.

L.11 \[b \cdot P = if (b \cdot P; b \cdot P) \land e \text{skip}\]

Parallel operator is symmetric and associative, and has \(\text{chaos}\) as zero.

L.12 \[P \land Q = Q \land P\]

L.13 \[P \land (Q \land R) = (P \land Q) \land R\]

L.14 \[\text{chaos} \land P = \text{chaos}\]

Parallel operator also distributes over conditional. It’s disjunctive.

L.15 \[(if \ b \ P \land e \ Q) \land R = if \ b \ (P \land R) \land e \ (Q \land R)\]

provided \(\text{Var}(b) \land \text{Var}(R) = \emptyset\).

L.16 \[(P \land Q) \land R = (P \land R) \land (Q \land R)\]

Local variable declaration enjoys the following laws.

L.17 \[\text{var} \ x \bullet (x := e) = \text{skip}\]

L.18 \[\text{var} \ x \bullet (if \ b \ P \land e \ Q) = if \ b \ (\text{var} \ x \bullet P) \land e \ (\text{var} \ x \bullet Q)\]

provided \(x\) is not free in \(b\).

L.19 \[\text{var} \ x \bullet (P; Q) = P; \text{var} \ x \bullet Q\], provided \(x\) is not free in \(P\).

L.20 \[\text{var} \ x \bullet (P; Q) = \text{var} \ x \bullet P; Q\], provided \(x\) is not free in \(Q\).

The following laws deal with assignment expansion.

L.21 \[(x := e; S) \land T = x := e; (S \land T)\]

The following law is one of the general expansion laws of Occam \([13]\), which deals with the case where two parallel processes are guarded choice constructs.

L.22 \[P = \bigotimes_{i=1}^{n} (g_i \ P_i), Q = \bigotimes_{j=1}^{m} (h_j \ Q_j),\]

where each \(g_i, h_j\) has one of the forms \(c \cdot e, c \cdot x\) or \(\text{skip}\), then \(P \land Q = \bigotimes_{i=1}^{n} (k_i \ R_i),\)

where the pairs \(c \cdot e, c \cdot x\) or \(\text{skip}\) and \(c \cdot e, c \cdot x\) or \(\text{skip}\), where \(c \notin \text{Chan}(Q)\).

(iii) \(R_x = x := \text{skip}\) and \(c \cdot e, c \cdot x\) or \(\text{skip}\) and \(c \cdot e, c \cdot x\) or \(\text{skip}\), where \(c \notin \text{Chan}(P)\).

Corollary 3.1

1. \((c \cdot e; P) \land (c \cdot x; Q) = x := e; (P \land Q)\)

2. \((c \cdot e; (c \cdot x; Q)) = x := e; (P \land Q)\)

3. \((c \cdot e; (c \cdot x; Q)) = x := e; (P \land Q)\)

The proof is presented in \([12]\). We exhibit two derived algebraic laws as follows from those basic ones.

The test of conditional should be evaluated first.

D.L.1 \[if \ b \ P \land e \ Q = \text{var} \ b \bullet (lb := b; if \ lb \ P \land e \ Q)\]

Proof \(\text{RHS}\)

\[= \text{var} \ b \bullet (lb := b; P) \land e \ Q)\]

\[= (\text{LHS})\]

\[= \text{var} \ b \bullet (lb := b; P) \land e \ Q)\]

\[\text{LHS}\]

The condition of iteration is evaluated at the beginning of every loop.

D.L.2 \[b \cdot P = \text{var} \ b \bullet (lb := b; b \bullet (P \land lb := b))\]

The proof is omitted here because of the page limit. It can be found in \([12]\). We introduce an ordering relation between two programs as follows before further discussion.

Definition 3.3 (Refinement)

Given programs \(P, Q\), we say \(Q\) is a refinement of \(P\), denoted as \(P \sqsubseteq Q\), if \(P \sqcap Q = P\) is algebraically provable.

4. The Static Analysis

This section illustrates the static analysis on the source program, which provides plenty of information to the programmer to assist the appropriate implementation-oriented program marking and interface partitioning of the source
program, aiming to gain higher performance and as well achieve lower cost.

The static analysis comprises two parts: the subprogram/expressions analysis and the variable analysis. The output of the subprogram/expressions analysis consists of three kinds of information, which will be presented in three tables, respectively.

- Structural complexity of non-trivial expressions in the program and numbers of their occurrences
- Numbers of invocations of procedures, the complexity of their parameters and their structures
- Complexity of those implementation-undetermined blocks

The complexity of expressions is specified by the function \( \text{complex} \) as follows.

**Definition 4.1** Let \( EXP \) be the set of expressions occurred in the source program, \( \text{complex} : EXP \rightarrow \mathbb{N} \) is inductively defined on the structure of expressions:

\[
\text{complex}(v) = w(\text{type}(v)), \text{ for any variable } v;
\]

\[
\text{complex}(c) = 1, \text{ for any constant } c; \text{ and}
\]

\[
\text{complex}(\text{op}(e_1, \ldots, e_n)) = \sum \text{complex}(e_i) + \text{complex}(\text{op}),
\]

where \( \text{op} \) is any operator used to construct expressions in the source language, and \( \text{complex}(\text{op}) \) is defined by the programmer in accordance to the complexity of \( \text{op} \), the function \( w : \text{TYPE} \rightarrow \mathbb{N} \) associates a number to each type of variables and channels in the program to measure their complexity.

By scanning the program, we obtain the occurrence frequency of expressions, which can be regarded as another factor of criteria about busyness of expressions.

By scanning the program, we also gain the number of invocations of procedures. Through analysing the declarations of those procedures in the program, we get the complexity of their parameters. Suppose \( v_1 : T_1, \ldots, v_k : T_k \) is the list of parameters for some procedure, then the complexity of its parameters is \( \sum_{i=1}^{k} w(T_i) \).

It is also possible to define the complexity of procedures or blocks that do not contain iterations. If the number of loops can be predicted or estimated, the complexity of those which contain iterations can also be calculated.

**Definition 4.2** The complexity of subprograms (procedures/blocks) can be evaluated as follows.

\[
\text{com}(v := e) = w(\text{type}(v)) + w(=) + \text{complex}(e);
\]

\[
\text{com}(e; e') = w(\text{type}(e)) + \text{complex}(e);
\]

\[
\text{com}(e ? x) = w(\text{type}(e)) + \text{complex}(x);
\]

\[
\text{com}(S_1 ; S_2) = \text{com}(S_1) + \text{com}(S_2);
\]

\[
\text{com}(\text{if } b \text{ then } S_1 \text{ else } S_2) = w(b) + \max(\text{com}(S_1), \text{com}(S_2));
\]

\[
\text{com}(\text{while } b \text{ do } S) = w(b) \times \text{max}(\text{num of loops is } \text{maxN}, \infty), \text{ otherwise}.
\]

where \( w(=) \) is defined by the programmer.

Based on the three tables the analysis generates, the programmer can appropriately figure out those parts that should be implemented by hardware, in accordance with those guidelines listed before.

The second step of the analysis provides the following information about variables.

- the structural complexity of each variable
- the occurrence frequency of each variable
- the distributive information of each variable

We illustrate an industrial example in the following.

**Example 4.3** The source program is concerned with the design of an ATM switch. The code is illustrated in the appendix.

Provided \( \text{complex}(+) = \text{complex}(-) = \text{complex}(\leq) = \text{complex}(\land) = 10, \text{complex}(\land) = \text{complex}(+) = 5, \)

\( w(\text{int}) = 8, w(\text{Bool}) = 1, w(=) = 2, \) then the results of the analysis are listed below.

<table>
<thead>
<tr>
<th>expression</th>
<th>complex</th>
<th>num</th>
</tr>
</thead>
<tbody>
<tr>
<td>RP + 1</td>
<td>19</td>
<td>2</td>
</tr>
<tr>
<td>RC + 1</td>
<td>19</td>
<td>5</td>
</tr>
<tr>
<td>AC + 1</td>
<td>19</td>
<td>4</td>
</tr>
<tr>
<td>RM + 1</td>
<td>19</td>
<td>3</td>
</tr>
<tr>
<td>1.\sim\text{ok.LPP}</td>
<td>6</td>
<td>1</td>
</tr>
<tr>
<td>2.\text{ok.HPP} \land \text{ok.HPM}</td>
<td>7</td>
<td>1</td>
</tr>
<tr>
<td>3.\text{ok.HPP} \land \neg\text{ok.HPM} \land \text{ok.LPM}</td>
<td>18</td>
<td>1</td>
</tr>
<tr>
<td>4.\text{ok.HPP} \land \neg\text{ok.HPM} \land \neg\text{ok.LPM}</td>
<td>23</td>
<td>1</td>
</tr>
<tr>
<td>5.\sim\text{ok.HPP} \land \text{ok.LPP} \land \text{ok.LPM}</td>
<td>18</td>
<td>1</td>
</tr>
<tr>
<td>6.\sim\text{ok.HPP} \land \neg\text{ok.LPP}</td>
<td>17</td>
<td>1</td>
</tr>
<tr>
<td>7.\sim\text{ok.HPP} \land \neg\text{ok.LPP} \land \neg\text{ok.LPM}</td>
<td>28</td>
<td>1</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>procedure</th>
<th>num</th>
<th>complex of para.</th>
<th>complex of proc.</th>
</tr>
</thead>
<tbody>
<tr>
<td>GCRA</td>
<td>4</td>
<td>57</td>
<td>141</td>
</tr>
<tr>
<td>UPT</td>
<td>9</td>
<td>82</td>
<td>94</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>variable</th>
<th>num. complex</th>
<th>distribution</th>
</tr>
</thead>
<tbody>
<tr>
<td>VPI, VCI</td>
<td>2</td>
<td>input</td>
</tr>
<tr>
<td>GFC, PT, HEC, PL, QoS</td>
<td>4</td>
<td>input/output</td>
</tr>
<tr>
<td>aT</td>
<td>5</td>
<td>input</td>
</tr>
<tr>
<td>X.L.LCT</td>
<td>5</td>
<td>GCRA, input</td>
</tr>
<tr>
<td>I</td>
<td>4</td>
<td>GCRA, input</td>
</tr>
<tr>
<td>nX</td>
<td>8</td>
<td>GCRA, output</td>
</tr>
<tr>
<td>nLCT</td>
<td>7</td>
<td>GCRA, output</td>
</tr>
<tr>
<td>ok</td>
<td>22</td>
<td>1, 2, 3, 4, 5, 6, 7</td>
</tr>
<tr>
<td>CLP</td>
<td>6</td>
<td>UPT, input</td>
</tr>
<tr>
<td>nCLP</td>
<td>12</td>
<td>UPT, input/output</td>
</tr>
<tr>
<td>RP, RM, RC, AC</td>
<td>10</td>
<td>UPT</td>
</tr>
<tr>
<td>nRP, nRM, nRC, nAC</td>
<td>9</td>
<td>UPT</td>
</tr>
<tr>
<td>send</td>
<td>10</td>
<td>1</td>
</tr>
<tr>
<td>nVPI, nVCI, pN</td>
<td>6</td>
<td>input/output</td>
</tr>
</tbody>
</table>
The criterion of the interface partitioning is that a variable should be allocated to hardware if its structure is not complicated and it occurs in those procedures/blocks which are assigned to hardware more often than those ones that are left to software.

5. The Hardware/software Target Architecture

This section describes the target architecture of our partitioning approach which confines hardware and software components to specially chosen forms. To synchronize their activities, we introduce a simple handshaking protocol to streamline communications between them.

Suppose \( B = \{r_j, a_j \mid j \in I\} \) is a set of channels, we define \( CP(B) \) as the set of communicating processes \( C \) with Chan(C) \( \supseteq B \) and one of the following forms.

(1) a communicating process which does not use any channel in \( B \).

(2) \( r_j \mid e; C \mid a_j \neq a_i \), where \( C \) is a member of \( CP(B) \) not interacting via channels in \( B \).

(3) \( C_1 C_2 \) or \( C_1 \cap C_2 \), or if \( b \) \( C_1 \neq C_2 \), or \( (g_i, C_1)((g_2, C_2) \) for both \( g_i \) and \( C_i \) lie in \( CP(B) \), for \( i = 1, 2 \).

(4) \( b \cdot C \), where \( C \) is a member of \( CP(B) \).

To simplify the interface design, we confine the interactions between the hardware and software components to the communications along the channels from the set \( B \). Our partitioning rules will select the software components from the set \( CP(B) \), and organise the hardware component in the form of

\[
D = \mu X \cdot (\bigcup_{j \in I} (r_j x_j ; M_j \downarrow a_j \downarrow y_j ; X) \mid \text{skip})
\]

where none of \( M_j \) mentions channels in \( B \). The communicating process \( D \) represents a digital device which offers a set of services to its environment, each of which responds to a request from its environment on an input channel \( r_j \) by running the corresponding program \( M_j \) and delivering the result to the output channel \( a_j \) afterwards. The translation from such a hardware specification to netlists will be tackled using the hardware compilation techniques [11].

We denote as \( H(B) \) the set of those processes which own the same form as \( D \).

**Theorem 5.1** \( (C_1, C_2) \parallel D = (C_1 \parallel D) ; (C_2 \parallel D) \), for any \( C_1, C_2 \in CP(B) \).

**Proof** By structural induction on \( C_1 \).

(1) No channels in \( B \) appear in \( C_1 \).

\[
\begin{align*}
LHS & \equiv C_1 ; (C_2 \parallel D) \equiv \text{Co3.I}(C_2) \\
RHS & \equiv \text{Co3.I}(C_2)
\end{align*}
\]

(2) \( C_1 = r_j \mid e; C ; a_j \neq a_i \), for some \( r_j, a_j \in B \), \( C \in CP(B) \), and no channel in \( B \) occurs in \( C \).

\[
\begin{align*}
LHS & \equiv \text{Co3.I}(C_1) \\
RHS & \equiv \text{Co3.I}(C_1)
\end{align*}
\]

(3) \( C_1 = C_0_1 \parallel C_0_2 \), where \( C_0_1, C_0_2 \in CP(B) \).

We know \( C_0_1, C_0_2 \in CP(B) \). From the definition of \( CP(B) \), then

\[
\begin{align*}
LHS & \equiv \text{Co3.I}(C_1) \\
RHS & \equiv \text{Co3.I}(C_1)
\end{align*}
\]

(4) \( C_1 \) is one of the cases: (i) if \( b C_0_1 \) else \( C_0_2 \), (ii) \( C_0_1 \cap C_0_2 \), (iii) \( (g_1 C_0_1)((g_2 C_0_2) \), where \( C_0_1, C_0_2 \in CP(B) \). We demonstrate the first case here, others are similar [11].

\[
\begin{align*}
LHS & \equiv \text{Co3.I}(C_1) \\
RHS & \equiv \text{Co3.I}(C_1)
\end{align*}
\]

(5) \( C_1 = b \cdot C_0 \), where \( C_0 \in CP(B) \).

We define \( F(X) =_{df} b \cdot C_0 \) else \( \text{skip} \), and \( F^n(\text{chaos}, n \geq 0) = F^n(\text{chaos}, n \geq 0) \), and \( F^{n+1}(\text{chaos}) =_{df} F^n(\text{chaos}, n \geq 0) \).

Then \( C_1 = \mu X \cdot F(X) = \bigcup_{n \geq 0} F^n(\text{chaos}) \), and \( F^n(\text{chaos}) \in CP(B) \).

**Corollary 5.2** If \( C \in CP(B) \), then \( (b \cdot C) \parallel D = b \cdot (C \parallel D) \). The proof is presented in [12].

\[
\square
\]

6. Syntax-based Splitting Rules

This section discusses program splitting rules. First we show how the static analysis affects the partition of primitive commands into hardware and software components. Secondly we demonstrate how to construct hardware and software parts of a construct from those of its constituents. We establish the correctness of those rules by using the algebraic laws given in Section 3.

We introduce a predicate \( \text{Split} \), which will be of great help in formalising the decomposition rules.

**Definition 6.1 (Split)**

Let \( B = \{r_j, a_j \mid j \in I\} \). Given a sequential process \( S \), its hardware/software partition \( (C, D) \) is specified by the following predicate:

\[
\text{Split} \equiv (S, C, D) =_{df} S \subseteq (C \parallel D) \land C \in CP(B) \land D \in CP(B) \land \text{Var}(C) \cap \text{Var}(D) = \emptyset \land \text{Chan}(C) \cap \text{Chan}(D) = B \land \text{Input Chan}(C) \cap \text{Input Chan}(D) = \emptyset \land
\]

313
OutputChan(C) \cap OutputChan(D) = \emptyset

where InputChan(C) is the set of channels employed by C and only used for input tasks, OutputChan(C) is similar. □

6.1. The Bottom-up Splitting Approach

The bottom-up approach builds the hardware component from a program directly from the static analysis in one step, i.e., the hardware device is to provide all the services frequently used by the program. However, it constructs the software component from those of its constituents using the following rules.

**Bottom-up Rule for Sequential Composition**

\[
\text{\textit{Split}}_B(S_i, C_i, D) \quad i = 1, 2
\]

\[
\text{Var}(S_1) = \text{Var}(S_2), \quad \text{Chan}(C_1) = \text{Chan}(C_2)
\]

\[
\text{\textit{Split}}_B(S_i, C_i, C_1, C_2, D)
\]

\[
\text{Proof:}
\begin{align*}
S_1 & \subseteq S_2 \quad \{\text{is monotonic}\} \\
& \subseteq (C_1 \parallel D); \quad (C_2 \parallel D) \quad \text{(Th.5.1)} \\
& = (C_1 \parallel C_2) \parallel D \quad \square
\end{align*}
\]

**Bottom-up Rule for Conditional**

\[
\text{\textit{Split}}_B(S_i, C_i, D) \quad i = 1, 2
\]

\[
\text{Var}(S_1) = \text{Var}(S_2), \quad \text{Chan}(C_1) = \text{Chan}(C_2)
\]

\[
\text{Var}(b) \subseteq \text{Var}(C_1)
\]

\[
\text{\textit{Split}}_B(\text{if } b \text{ then } S_1 \text{ else } S_2, \text{ if } b \text{ then } C_1 \text{ else } C_2, D)
\]

\[
\text{Proof:}
\begin{align*}
\text{if } b \text{ then } S_1 \text{ else } S_2 & \quad \{\text{cond is monotonic}\} \\
& \subseteq (b \parallel D) \text{ else } (C_2 \parallel D) \quad \{\text{(1.15)}\} \\
& = (b \parallel C_2) \parallel D \quad \square
\end{align*}
\]

**Bottom-up Rule for Iteration**

\[
\text{\textit{Split}}_B(S, C, D)
\]

\[
\text{Var}(b) \subseteq \text{Var}(C)
\]

\[
\text{\textit{Split}}_B(b \ast S, b \ast C, D)
\]

When \text{Var}(b) \cap \text{Var}(D) \neq \emptyset, we will introduce a local variable \( b \), and rewrite the conditional and iteration into the forms:

\[
\text{var } b \ast (b := b; \text{ if } b \text{ then } S_1 \text{ else } S_2), \text{ and}
\]

\[
\text{var } b \ast (b := b; b \ast (S; b := b)) \text{ respectively by law DL1 and DL2. The partitioning rule for } b := b \text{ will be discussed later.}
\]

The non-deterministic choice can be regarded as a special case of guarded choice when all the guards are skip. We present the partitioning rule for guarded choice constructs as follows and omit the rule for non-deterministic choice.

**Bottom-up Rule for Guarded Choice**

\[
\text{\textit{Split}}_B(S_i, C_i, D) \quad i = 1, 2
\]

\[
\text{Var}(S_1) = \text{Var}(S_2), \quad \text{Chan}(C_1) = \text{Chan}(C_2)
\]

\[
\text{Var}(g_i) \subseteq \text{Var}(C_i), \quad i = 1, 2
\]

\[
\text{Chan}(g_i) \subseteq \text{Chan}(C_i), \quad i = 1, 2
\]

\[
\text{\textit{Split}}_B((g_1 S_1)(g_2 S_2), (g_1 C_1)(g_2 C_2), D)
\]

The proofs for the last two rules are straightforward and are presented in [112], due to the page limit.

6.2. The Top-down Splitting Approach

In this approach, both the hardware and software components of the source program are assembled from those of its constituents.

Before presenting a set of top-down splitting rules, we introduce the notion of interface-consistency on hardware components.

**Definition 6.2** For \( k = 1, 2 \), let

\[
D_k = \mu X \bullet (\langle i \in I \rightarrow (r_j x_i; M_i; a_i y_j; X) \rangle \text{skip}),
\]

\( D_1 \) and \( D_2 \) are said to be interface-consistent, denoted by

\[
\text{Consist}(D_1, D_2), \text{ if } \text{Var}(D_1) = \text{Var}(D_2), \text{ and}
\]

\[
\text{Chan}(D_1) \cup B_1 = \text{Chan}(D_2) \cup B_2.
\]

where \( B_k = \{r_j, a_j \mid j \in I_k\} \), for \( k = 1, 2 \).

In such a case, we define

\[
D = \text{union}(D_1, D_2) = \mu X \bullet (\langle i \in I \rightarrow (r_j x_i; M_i; a_i y_j; X) \rangle \text{skip})
\]

We first present a basic rule, from which and the bottom-up rules we obtain the corresponding top-down rule straightforwardly in each case.

**Rule for Hardware Augmentation**

\[
\text{\textit{Split}}_B(S, C, D)
\]

\[
\text{\textit{Consist}}(D_1, D_2), \quad \text{Chan}(C) \cap B_2 \subseteq B_1
\]

\[
\text{\textit{Split}}_{B_1 \cup B_2}(S, C, D)
\]

**Top-down Rule for Sequential Composition**

\[
\text{\textit{Split}}_B(S_i, C_i, D_i) \quad i = 1, 2
\]

\[
\text{Var}(S_1) = \text{Var}(S_2), \quad \text{Chan}(S_1) = \text{Chan}(S_2)
\]

\[
\text{Consist}(D_1, D_2)
\]

\[
\text{\textit{Split}}_{B_1 \cup B_2}(S_i, C_i, D_i)
\]

**Top-down Rule for Conditional**

\[
\text{\textit{Split}}_B(S_i, C_i, D_i) \quad i = 1, 2
\]

\[
\text{Var}(S_1) = \text{Var}(S_2), \quad \text{Chan}(S_1) = \text{Chan}(S_2)
\]

\[
\text{Consist}(D_1, D_2), \quad \text{Var}(b) \subseteq \text{Var}(C_i)
\]

\[
\text{\textit{Split}}_{B_1 \cup B_2}(b \ast S, b \ast C, D)
\]

**Top-down Rule for Guarded Choice**

\[
\text{\textit{Split}}_B(S_i, C_i, D_i) \quad i = 1, 2
\]

\[
\text{Var}(S_1) = \text{Var}(S_2), \quad \text{Chan}(S_1) = \text{Chan}(S_2)
\]

\[
\text{Consist}(D_1, D_2)
\]

\[
\text{Var}(g_i) \subseteq \text{Var}(C_i), \quad \text{Chan}(g_i) \subseteq \text{Chan}(C_i), \quad i = 1, 2
\]

\[
\text{\textit{Split}}_{B_1 \cup B_2}(g_1 S_1)(g_2 S_2), (g_1 C_1)(g_2 C_2), D)
\]
6.3. Splitting Primitive Commands

This section deals with primitive commands splitting. We only investigate the following nontrivial cases: the assignment, the invocation of a procedure, and the annotated blocks.

1. An assignment \( u := e(v) \)

We focus on the cases where both hardware and software participate in the evaluation of \( e(v) \) and the variable \( v \) has been allocated to the hardware component.

\[
\text{Split}_H(u := e(v), \ C, \ D) \text{ where} \\
C = \mu x \cdot (\langle r_j \rangle_{1, a_j \# u}) \text{ and} \\
D = \mu x \cdot (\langle r_j \rangle_{2, x}; y := e(v); a_j \# y; X \} \text{ skip})
\]

Case 2: \( e(v) \) is a "busy" expression, however, \( v \) has been allocated to the software component.

\[
\text{Split}_S(u := e(v), \ C, \ D) \text{ where} \\
C = \mu x \cdot (\langle r_j \rangle_{1, v}; a_j \# u), \text{ and} \\
D = \mu x \cdot (\langle r_j \rangle_{2, x}; y := e(x); a_j \# y; X \} \text{ skip})
\]

Case 3: \( e(v) \) is not a "busy" expression, but \( v \) is allocated to the hardware component.

\[
\text{Split}_H(u := e(v), \ C, \ D) \text{ where} \\
C = \lambda x \cdot (\langle r_j \rangle_{1, v} \text{ lift}; a_j \# u; u := e(lift)), \text{ and} \\
D = \mu x \cdot (\langle r_j \rangle_{2, x}; y := e(x); a_j \# y; X \} \text{ skip})
\]

More intricate case of assignment \( u := e(v, w) \), where \( v \) and \( w \) have respectively been allocated to the software component and the hardware one, will be converted to several successive assignments owning the form we have dealt with above, by the algebraic law with respect to assignments.

2. A procedure invocation

Without lose of generality, we investigate the invocation \( \text{proc}(e_s, e_H, v_S, v_H) \), where \( e_s \) is supplied by software, \( e_H \) is evaluated by hardware, \( v_S \) and \( v_H \) are allocated to software and hardware, respectively. We are interested in the case where the procedure is implemented by hardware.

\[
\text{Split}_H(\text{proc}(e_s, e_H, v_S, v_H), \ C, \ D) \text{ where} \\
C = \mu x \cdot (\langle r_j \rangle_{1, e_s; a_j \# v_S}) \text{ and} \\
D = \mu x \cdot (\langle r_j \rangle_{2, x}; \text{proc}(x, e_H, y, v_H); a_j \# y; X \} \text{ skip})
\]

3. An annotated block

We concentrate on the case where the block \( B(v_S, v_H) \) is predetermined to be implemented by hardware, and the variables that occur in the block \( v_S \) and \( v_H \) are allocated to software and hardware, respectively. We need to arrange the data flow between software and hardware.

\[
\text{Split}_H((B(v_S, v_H)), \ C, \ D) \text{ where} \\
C = \mu x \cdot (\langle r_j \rangle_{1, v_S; a_j \# v_H}) \text{ and} \\
D = \mu x \cdot (\langle r_j \rangle_{2, x}; y := x; B(y, v_H); a_j \# y; X \} \text{ skip})
\]

7. Conclusion

This paper shows how the hardware/software partitioning problem can be tackled in the algebra of programs.

The partitioning task consists of the static program analysis phase and the splitting phase, where the former provides the information for moving operations from software to hardware and reducing the communication between components, and the latter supports a compositional approach to the program partitioning. To synchronize software and hardware components, and reduce the complexity of their interface, we introduce a simple handshake protocol, and propose a normal form for the hardware components. The correctness of the splitting process is verified using the algebraic laws of the source language. To deal with co-design of embedded systems, we shall introduce timing constraints into our source program, which will result in timed hardware and software components.

References


8. Appendix

The source code in Example 4.3.

---
- ATM switch
- Variables
  - GFC, VPI, VCI, PT, CLP, HEC, PI - ATM fields
  - aT - arrival time of the cell
  - nVPI, nVCI - new id of the cell
  - QoS - quality of service
  - pN - Port number
  - HPP - variables of high priority peak policy
  - LPP - variables of low priority peak policy
  - HPN - variables of high priority mean policy
  - LPM - variables of low priority mean policy
  - RP/CRP - current/new value of rejected cells due to peak
  - RM/nRM - current/new value of rejected cells due to mean
  - RC/nRC - current/new value of rejected cells
  - AC/nAC - current/new value of accepted cells
  - send - boolean variable which decides whether the cell must be sent or not.
- Communication with the environment (channels)
  - chCell - receives the cell
  - chOut - sends the cell
  - ch1ReadTable - request data from the table
  - ch2ReadTable - receive the fields of the table
  - chRouteTable - receive the new identifiers and output port
  - chWTable - update the table

- Generic Cell Rate Algorithm — Leaky Bucket
  - X = bucket level
  - LCT = Last Conformance Time
  - ta = arrival time
  - I = increment
  - L = cell delay variation tolerance

procedure GCRA(In X, LCT, at, I, L: int, out ok: Bool, nLCT, nX: int)

begin
  var Xtmp: int; Xmp := X - (at - LCT);
  if (Xtmp < 0) then nX := I; nLCT := at; ok := true;
  else if (Xtmp <= L) then nX := Xmp + I; nLCT := at; ok := true;
  else nX := X; nLCT := LCT; ok := false;
end

procedure UPT(In t: Bool, p,m,n,r,a,c:l: int, out send: Bool, nRP, nRM, nRC, nAC, nCLP: int)

begin
  send := t; nRP := p; nRM := m;
  nRC := r; nAC := a; nCLP := c;
end

---

var GFC, VPI, VCI, PT, HEC, PI, aT, QoS: int,
  X, L, I, LCT, nX, nLCT:
  record of HPP, LPP, LPM:
  int end,
  ok, CLP, nCLP:
  record of HPP, LPP, LPM:
  Bool end,
  RP, nRP, RM, nRM, RC, nRC, AC, nAC:
  int, send:
  Bool,
  nVPI[3], nVCI[3], pN[3]: array of int;
  - Read the cell and the table
    chCell (? (GFC, VPI, VCI, PT, CLP, HEC, PI, aT); ch1ReadTable ! (VPI, VCI);
    ch2ReadTable ? (QoS, X, L, I, LCT, RP, RM, RC, AC);
    chRouteTable ? (nVPI[0], nVCI[0], pN[0]);
    chRouteTable ? (nVPI[1], nVCI[1], pN[1]);
    chRouteTable ? (nVPI[2], nVCI[2], pN[2]);
    GCRA(X, HPP, LCT, HPP, at, I, LPP, LHP,
       ok.HPP, nLCT, HPP, nX.HPP);
    GCRA(X, HPM, LCT, HPM, at, I, LPP, LHP,
       ok.HPM, nLCT, HPM, nX.HPM);
    GCRA(X, LPP, LCT, LPP, at, LPP, LPP,
       ok.LPP, nLCT, LPP, nX.LPP);
    GCRA(X, LPM, LCT, LPM, at, LPP, LPM,
       ok.LPM, nLCT, LPM, nX.LPM);
  if CLP
  if ~ ok.LPP UPT(false, RP+1, RM+1, RC+1, AC, CLP, send,
     nRP[nRM], nRC[nAC], nCLP);
  else if ok.LPP UPT(true, RP+1, RM+1, RC+1, AC, CLP,
    send, nRP[nRM], nRC[nAC], nCLP);
  else UPT(false, RP+1, RM+1, RC+1, AC, CLP,
    send, nRP[nRM], nRC[nAC], nCLP);
  else if ok.HPP & ok.HPM UPT(true, RP+1, RM+1, RC+1, AC+1, CLP,
    send, nRP[nRM], nRC[nAC], nCLP);
  else if ok.HPP ~ ok.HPM & ok.LPM UPT(true, RP+1, RM+1, RC+1, AC+1, send,
    nRP[nRM], nRC[nAC], nCLP);
  else if ok.HPP ~ ok.LPM UPT(true, RP+1, RM+1, RC+1, AC, send,
    nRP[nRM], nRC[nAC], nCLP);
  else if ~ ok.HPP & ok.LPM UPT(false, RP+1, RM+1, RC+1, AC, send,
    nRP[nRM], nRC[nAC], nCLP);
  else if ~ ok.HPP ~ ok.LPM UPT(true, RP+1, RM+1, RC+1, AC+1, send,
    nRP[nRM], nRC[nAC], nCLP);
  else skip;
  - Send the cell
  if send
    chOut ! (pN[0], QoS, GFC, nVPI[0], nVCI[0], PT, nCLP, HEC, PI);
    chOut ! (pN[1], QoS, GFC, nVPI[1], nVCI[1], PT, nCLP, HEC, PI);
    chOut ! (pN[2], QoS, GFC, nVPI[2], nVCI[2], PT, nCLP, HEC, PI);
    else skip;
  - Update the table
    chWTable ! (nX.HPP, nLCT, HPP, nX.HPM,
      nLCT, HPM, nX.LPP, nLCT, LPP); □