NRaD RoboticsNaval Command, Control and Ocean Surveillance Center
Research, Development, Test & Evaluation Division
San Diego, CA 92152-7383

Figure 1. Visual-motor functions and relationships
A smooth pursuit reflex, which takes input from motion in the foveal region, keeps the fovea centered on the acquired moving target. The optokinetic reflex, which responds to full field motion, stabilizes the eye when the body is in motion.
Reorientation of the robot to trail an acquired target is accomplished by basing commands to the robot drive motors on the camera pan angle, requiring the robot to drive in the direction of the gaze. This process is analogous to the targeting motion of the eyes, head and body in biological systems.
Trailing is accomplished by triggering forward thrust of the robot when the predominant motion of a centered target is toward the center of the visual field (contracting motion field). Collision is avoided by decreasing forward thrust when the target motion is away from the center (expanding). Obstacle avoidance is achieved by decreasing thrust on the side of the robot opposite to the peripheral motion away from the center of the visual field.
The obstacle avoidance reflex, which is transitory, assumes precedence over the pursuit reflex, allowing the robot to skirt around obstacles in pursuit of a target.
For an in-depth discussion of the biological visual processes from which we derived our algorithms, see Blackburn et al. [1993].
B0 = max (0, R - H) B1 = max (0, H - R). [1]The "on" elements indicate light intensity increasing in a localized region while the "off" elements indicate decreasing intensity. The output matrix is organized into local receptive fields and submitted to a log-polar transformation [Blackburn, 1993a] where the receptive field centers are placed proportionally further apart with their distance from the receptor matrix center, and the receptive field radii are also increased proportionally with the distance.

b G1 = (1/p) * Sum (B1 ); s.t.||(a,b)-(x,y)|| <= RFr [2] i,j a a,bwhere i and j are the coordinates in the transformed map, a and b are coordinates of elements located within the local receptive fields, x and y are locations of the local receptive field centers in the receptor matrix, and p is the variable number of elements in the local receptive fields. RFr is the radius of the local receptor fields, defined by:
RFr = g * E, [3]where g is a constant computed as
1/2 g = (2 * (1-cos(2 * PI/m)))to insure that for m number of local receptive fields for any given eccentricity, the radius of each local receptive field reaches the center of the next local receptive field on the circumference.
The eccentricity (E) of a local receptive field, defined as the location of the field center relative to the center of the receptor matrix, varies exponentially with the serial position from the center along the radius of the receptor matrix (with the constraint of a finite packing density of elements near the center forcing each radius to be at least one element diameter greater than the previous).
E = max (i, exp (z * (i/n))), [4]where i is the serial distance on a radius from the receptor matrix center (from 1 to n), n defines the number of local receptive fields to be located on a radius from the receptor matrix center, and z = log(N/2) with N/2 representing the number of receptors (or pixel elements) available along the receptor matrix radius.
The x,y locations of the receptive local field centers on the receptor matrix are determined by
x = (N/2) - E * sin Q [5] x,y y = (N/2) + E * cos Q, [6] x,ywhere Q is incremented from PI/2 to 5PI/2 by 2PI/m. The locations of receptive field centers from one eccentricity to the next is staggered by p/m so that a slightly asymmetric hexagonal matrix of receptive field centers results.
The averaging of pixels in receptive fields emphasizes large-magnitude effects. This is a desirable feature in building reliable artificial vision systems and may have been part of the reason for its adoption by nature.
(Download our MS-DOS log-polar transformation demonstration program (VGA and mouse required, ~37KB zipped).)
Peripheral receptive fields are large and set far apart compared to the central receptive fields. Thus, the center of the receptor surface is more sensitive to slow motion, while the peripheral region is more sensitive to fast motion. The direction of motion on the log-polar plane can be assessed using a simple compare-to-threshold approach combined with feed forward facilitation or feedback inhibition relative to the preferred direction of the motion analyzer [Blackburn et al., 1987].
(Diagram of the motion analysis network.)
The direction of motion is determined by dynamic filtering. The filter elements (MAI, MAII, and MAIII) are defined by:
MAI = C1 * MAI(t-1) + Sum (G1 ) , [7]
i,j i,j
MAII = C2 * MAII (t-1) + G1 [8]
i,j i,j i,j
MAIIIu = C2 * MAIIIu (t-1) + Sum ((1/k) * G1 ) [9]
i,j i,j k i+k,j
MAIIId = C2 * MAIIId (t-1) + Sum ((1/k) * G1 ) [10]
i,j i,j k i-k,j
where u indicates a filter element supporting the detection of
upward motion on the transformed map, d indicates downward
motion, i and j index the location of elements, k indexes the
offset of input elements in the +/- vertical directions, and C1
and C2 are constants of persistence (1.0 > C1 > C2 > 0).
Upward motion on the transformed map results from motion toward
the center of the receptor surface, while downward motion on
the transformed map results from motion away from the center.
(On- and off-center pathways are processed in parallel until the
final output, when their products are combined. These
equations are shown only for the on-center activity.) The input to the motion analysis subnetwork on the subsequent increment of time is then passed through to the direction of motion detectors (MAIVui,j and MAIVdi,j) based upon the filter activity,
MAIVu = max (0, C3 * MAI - MAIIIu ) * G1 +
i,j i,j i,j
Sum (max(0, MAII - C3 * MAI) * (1/k) * G1 ), [11]
k i+k,j i,j
MAIVd = max (0, C3 * MAI - MAIIId ) * G1 +
i,j i,j i,j
Sum (max(0, MAII - C3 * MAI) * (1/k) * G1 ) , [12]
k i-k,j i,j
where C3 is a gain constant (1.0 > C3 > 0). Equations [7]
through [10] are duplicated for the off-center activity, and
added to MAIVui,j and MAIVdi,j as in equations [11] and [12].One output of the motion analysis subnetwork (MAVIui,j and MAVIdi,j) is the net positive difference of the opposite direction of motion detectors,
MAVIu = max (0, MAIVu - MAIVd ) [13]
i,j i,j i,j
MAVId = max (0, MAIVd - MAIVu ). [14]
i,j i,j i,j
Another output of the motion-analysis subnetwork (MAVui,j and
MAVdi,j) is a measure of the motion contrast between the center
and the surround of a local region. The sums of the local
motion detectors in a neighborhood are taken for the opposing
directions and compared. The largest represents the net or most
likely direction of motion due to self movement through the
environment. If the direction of motion of the center of the
neighborhood is consistent with this net motion, then the center
can be ignored, otherwise it likely signals unique target
motion.
MAVu = MAVIu , if Sum (MAVIu ) < Sum (MAVId )
i,j i,j i,j i,j
= 0 else [15]
MAVd = MAVId , if Sum (MAVId ) < Sum (MAVIu )
i,j i,j i,j i,j
= 0 else [16]
The outputs of the MAV elements are sent to the target
acquisition subnetwork while the output of the MAVI elements
are sent to the approach and avoidance subnetworks (described
below).
The input to the target detection and centering subnetwork comes from the unique motion detectors (MAVdi,j and MAVui,j). These are weighted by the distances of their locations from the center of the receptor matrix and normalized by the sum of their potentials to find the location of the center of activity for target localization. A bias that is proportional to eccentricity is applied to the input to favor peripheral over central targets.
The input is retinotopically distributed and integrated over time, allowing excitation to build up in a local area,
OT_in (t) = C4 * OT_in (t-1) + W * (MAVd + MAVu ) [17]
i,j i,j i i,j i,j
where C4 is a constant of persistence (1.0 > C4 > 0), and Wi is
a bias factor that increases with eccentricity (i).
The required X and Y change in receptor matrix orientation (accomplished by camera pan and tilt commands) to center the matrix on a new target are
dX = Sum (x_distance * OT_in ) / Sum (OT_in )
i,j i,j i,j i,j i,j
dY = Sum (y_distance * OT_in ) / Sum (OT_in ) [18]
i,j i,j i,j i,j i,j
Noise is filtered from the subnetwork by disallowing
contributions to dX and dY from one hemisphere if the sum of
inputs in that hemisphere (Sum OT_in i,j ) is less than a dynamic
threshold (Q). The threshold is increased whenever it is
exceeded by the sum of inputs. Otherwise it dissipates like
all other potentials with persistence in the network.
Q = C6 * Q + C7 * Sum (OT_in ), if Q < Sum (OT_in )
i,j i,j i,j i,j
= C6 * Q, else [19]
where C6 is the threshold persistence and C7 is a gain factor
(1.0 > C7 > C6 > 0).
The smooth pursuit mechanism receives its input from the motion analysis subnetwork. Due to errors inherent in the mechanical pan and tilt unit, slow pursuit is performed by adjusting the processing window within the available video frame. The rate of change of the video window (dU, dV) is computed by:
dU = C8 * (dU + Sum (x * RFr * MAVId ) / Sum (MAVId )) [20]
i,j i,j i,j i,j i,j
dV = C8 * (dV + Sum (y * RFr * MAVId ) / Sum (MAVId )) [21]
i,j i,j i,j i,j i,j
where x and y define the quadrant of the location of activity
(+/- 1), and C8 is a constant of persistence (1.0 > C8 > 0).
While the platform is moving through the environment, unilateral image flows away from the center of the receptor surface in the peripheral region indicate the presence of potential obstacles. The required response is to reduce the thrust on the contralateral drive motor, and increase the thrust on the ipsilateral drive motor. When traveling down a corridor with sufficient pattern contrast on the two walls, such a reflex would tend to keep the platform as nearly in the center of the corridor as possible.
The output of the motion analysis subnetwork (MAVIui,j and MAVIdi,j) is also used to control the robot drive motors according to simple rules. Motor commands accumulate and dissipate according to
motor = C5 * input(t-1) + input, [22]
L,R
where C5 is the persistence of the input (1.0 > C5 > 0). The
input comes from the two hemi visual fields and causes an
increase or decrease in thrust in both drive motors. When either hemi visual field detects motion toward the center (indicating a receding target), thrust is increased to both motors inversely proportional to the absolute value of the distance from the center to the location of the motion on the receptor surface
input = +Sum (max_dist - abs(x_dist ) * MAVIu ), [23]
i,j i,j i,j
where max_dist is the greatest lateral extent of the receptor matrix. The sign of x_dist indicates the location of the motion on the left (-) or the right (+) of center.
When both hemi visual fields detect motion away from the center, thrust is decreased to both motors directly proportional to the absolute value of the distance from the center to the location of the motion on the receptor surface
input = -Sum (abs(x_dist ) * MAVId ). [24]
i,j i,j i,j
Potential obstacles that are detected by asymmetric optic flow away from the center of the receptor matrix cause increased thrust on the same side (g) and decreased thrust on the side opposite (f) to the optic flow. These changes in thrust are transitory and non-zero only under the conditions of asymmetric optic flow, and during an active forward drive command. The degree of change, resulting in a turn away from the obstacle, is proportional to the net forward thrust.
motor (t) = motor (t-1) + (motor (t-1)/max_thrust) *
g g g
Sum ((max_dist - abs(x_dist )) * MAVId ), [25]
i,j i,j i,j
motor (t) = motor (t-1) - (motor (t-1)/max_thrust) *
f f f
Sum ((max_dist - abs(x_dist )) * MAVId ). [26]
i,j i,j i,j
The turn command is transient and inversely proportional to the net forward thrust:
motor = motor (t-1) + pan_disp * (1.0-motor /max_thrust), [27]
L L L
motor = motor (t-1) - pan_disp * (1.0-motor /max_thrust). [28]
R R R
We use a Transitions Research Corporation (TRC) Labmate Mobile Robot Base. A single CCD video camera with a 90 degree field of view, mounted on a pan and tilt mechanism built in-house, provides monocular input to the vision processing hardware. Camera position is taken from shaft encoders located on the pan and tilt axles. Wheel motion information is obtained from encoders located on both left and right drive motor axles. Vision processing hardware includes an Imaging Technologies OFG Frame Grabber coupled to a Hyperspeed Technology coprocessor board with two i860 microprocessors. The vision processing hardware cards are hosted on an 80486 PC computer located in the robot housing. The PC provides I/O to the Labmate and pan and tilt controllers. The Hyperspeed board receives video data directly from the OFG board at frame rate over an ITI vision bus. One i860 processor is dedicated to subsampling the input frame and making decisions about the required motor responses, while the other i860 processor integrates the visual input into receptive fields and performs motion analysis. Pan, tilt and drive motor commands are sent to the 80486 for integration and execution.

Figure 2. The autonomous visually guided robot trailing a walking human in a cluttered environment. (52K bytes)
Testing was performed in a large partitioned room with an open work area of 32 by 18 feet. Three walls of this work area contained windows, doors and office furniture. An example of target acquisition and pursuit is shown in the photographs of Figure 2. From a resting position the robot turned and moved forward in pursuit of a human walking into its visual space. Obstacle avoidance was disabled during this demonstration run to allow the robot to approach the cluttered desk. With obstacle avoidance in place the robot tended to approach the target only slowly, until the position of the target allowed the robot a clear run down the center of the floor.
Blackburn, M.R. [1993b]. "Machine visual targeting modeled on biological reflexes." NRaD TD 2455, Naval Command, Control and Ocean Surveillance Center, RDT&E Division, San Diego, California.
Blackburn, M.R., H.G. Nguyen and T.T. Tran [1993]. "Autonomous Mobile Robot Vision: Target Tracking, Trailing, and Obstacle Avoidance," in IR-IED '93 Annual Report, NRaD TD 2604, Naval Command, Control and Ocean Surveillance Center, RDT&E Division, San Diego, CA, pp.63-85.
Blackburn, M.R., H.G. Nguyen, and P.K. Kaomea. [1987]. "Machine visual motion detection modeled on vertebrate retina." SPIE Proceedings, vol. 980, pp. 90-98.
Download the PDF version of this paper (1404 KB).
NRaD Neural Modeling for Robotic Applications