Sega Saturn to the limit (II)

First draft translation. Apologies for the mistakes. I will improve the text in future revisions.

NOTE: Long entry. In this post I have used a lot of information, complex and in some aspect not precise. Is possible I have made mistakes in interpretation or understanding. I will try to correct everything that comes up.

In this spreadsheet you have the data that I have collected from 321 games. Of the approx. 1200 titles in total. That they are 20% approx. of the total:

Index:

  1. The impossible to solve: Triangle vs Quadrangle.
    1.1 Triangle vs Quadrangle – EXTRA Ball: UV Mapping
  2. The least complicated: Gouraud shading and dynamic colour lighting.
  3. The complicated I: Softness in 3D games = 500 / 1.000 /1.500/ ~2.000 quads frame = 25/30/60 stable FPS.
    3.1 The complicated I – EXTRA Ball I: Use of the SCU-DSP
    3.2 The complicated I – EXTRA Ball II: Resolution screen SD / HD
    3.3 The complicated I – EXTRA Ball III: Tessellation / LOD scenario / Mip Mapping
  4. The complicated II: FMV full screen and colour quality.
    4.1 The Complicated II – EXTRA I Ball: Advanced Color Calculation Function “Gradation / Boken / Blur”
  5. The most complicated: Transparencies and / or half-transparencies
    5.1 Transparencies and / or half-transparencies – EXTRA I Ball: Transparencies + Gouraud = “Table FOG “o Depth Cueing
    5.2 Transparencies and / or half-transparencies – EXTRA Ball II: Reflections in floors
  6. The most complicated II: Render-to-texture
  7. The difficult to” see “: Reverb and / or Echo sound effects
    7.1 The difficult to” see “- EXTRA I Ball: ADPCM and CD-ROM XA
  8. The difficult thing to” load “: Loads the game without stopping
    Conclusions
    Epilogue
    References
    Glossary

3. The complicated I: Softness in 3D games = 500 / 1.000 / 1.500 / ~2.000 quads / frame = 25/30/60 stable FPS.

In summary, the key to stability in both machines is in these 3 points:

1) Maximize the transformation and lighting pipeline and the code for the available processors.

2) Optimize the calls between the CPU/CPUs and the GPU/VDP1-2 so that none is waiting for the other. And that there is a continuous and constant dialogue, without stops between all the elements.

3) Do not exceed the limits of total cycles to draw a frame, according to the primitives plus effects within the pipeline.

4) Optimize the data flow in the BUS/BUSES and optimize the use of the DMA over them. Without exceeding the maximum possible traffic, distributing the processes in the best possible way.

Bottleneck 1: Complexity 2x or 3x

Achieve a high or stable rate of frames, resolution (screen and textures) and color quality (screen and textures) with respect to PSX in similar versions… It was not an easy task. And having double (or triple) variables to control in everything, did not help the programmers of the time. Starting by not being able to have the triangle as a primitive geometry and ending up by having the double (or triple) of memory wells to manage.

Decide which variables to use for the games such as: the polygonal load, the screen effects, the Gouraud shading, the dynamic lighting, what resolution/depth screen color and what resolution/depth of color in textures. And in case they do the task of designing a complicated engine.

If we add to that variables of the nature of the hardware. Where the equivalent of SS to PSX, was duplicated, if not more. To see it more clearly I will make a simple but precise comparison:

  • Starting with the primitive. The triangle with 3 vertices in PSX, while in SS the quadrangle with 4 vertices.
  • Main CPU on PSX x1, on SS x2 (SH2), not counting SH1, exclusive for access to the CD-ROM and playback of MPEG-1 with an expansion card, which could not be programmed (as of today).
    • Speeds, bit width and Calculation power / MIPS (theoretical):
      • PSX:
        • MIPS R3000A 32bits 33MHz 30MIPS
          • MDEC video co-processor 80MIPS
      • SS:
        • SH2 32bits 28MHz 28MIPS each. 50MIPS total.
          • SCU-DSP 32bits 14Mhz 84MIPS.
  • The geometric co-processor (DSP): In PSX x1 (GTE), in SS x3 (2xSH2 and/or SCU-DSP).
    • Speeds and bit width (theoretical):
      • PSX within the MIPS together with the MDEC:
        • GTE 32bits 33MHz 16,5 MOPS* (2 cycles per MAC**) 41,25 MOPS (0.8 cycles equivalent DIV, with poor accuracy)
          • 0,94MOPS (25 cycles real DIV if MIPS is used)
      • SS:
        • 2xSH2 32bits 2x28MHz 2×14 MOPS (2 cycles per MAC) 2×14 MOPS (2 cycles per division DIV1 -> needs DIV0S or DIV0U before)
        • SCU-DSP 32bits 14Mhz 14 MOPS (1 cycle per MAC, 2 system cycles)
          • They work in parallel with the entire system.
          • Except that you can not access the SH2 Slave or vice versa.
            • The SCU for the rest of the main functions: 32bits 28Mhz.
        • Total MOPS for 2xSH2+SCU-DSP = 48 MOPS
        • Special non-pregrammable fixed DSP in VDP2: 28MOPS (Does several types of operations (MAC equivalents, DIV…) in 1 cycle, only for rotated 3D planes)
          * MOPS: Millions of instructions per second
          ** MAC: Sum and product in a single cycle.
  • GPU or Graphics Processor, in PSX x1 in SS x2 (VDP1 and 2). The VDP1 in many aspects was similar to the “plot” of the PSX, however, in essence the SS was 2D, while the PSX was 3D. And the difference was, in the ability to process triangles and texture projection in UV.
    • Speeds, bit width and Color <> Filling of pixels / Mpixels (theoretical):
      • PSX:
        • GPU 33 / 53Mhz* V1: 16 / 16bit V2: 32bit 15bit <> 33-66** Mpixels/s
          * I do not have clear the final speed of the GPU, having to compose the final image for the TV output, at 50/60hz. And possible explanation of power up to 2 pixels per maximum cycle. In the online documentation of NO$PSX it says the following:

          • Video Clock
            The PSone / PAL video clock is the cpu clock multiplied by 11/7.
            CPU Clock = 33,868800MHz (44100Hz * 300h)
            Video Clock = 53,222400MHz (44100Hz * 300h * 11/7)
            ** 66Mpixels for Sprites without scaling or rotation, only moved and Flat Polygons (without texture).
      • SS:
        • VDP1: 28Mhz 16bit, 15bit color <> 28Mpixels/s. // 8bit <> 35* Mpixels/s
          * Framebuffer erase at 2 pixels per cycle.
        • VDP2: 28/57Mhz * 16bit, 24-4bit color <> 28** Mpixels/s.
          • 3D plane rotated and/or 24bit <> 28** Mpixels/s
            * I do not have clear the final speed of the VDP2, having to compose the final image for the TV output, at 50/60hz.
            ** It has no penalty, because it has an internal margin of up to 8 accesses (read / write) within its pipeline. And with these you can perform all your specified tasks.
  • Sound, on PSX x1 on SS x2 (M68000 and SCSP = Yamaha PCM / FM + DSP).
    • PSX: SPU 33MHz? 16bit
      • 24 voices, ADPCM (4:1 hardware decompression) plus preset Chorus and Reverb effects.
    • SS: SCSP 22Mhz 16bit. DSP 24bit.
      • 32 voices, PCM plus effects via programmable DSP for sound was an extra on the SS Sound chip to make effects, even 3D sound. And of a high complexity, of course.
      • CPU Sound M68000: 16bit 11MHz 2MIPS
      • 1x Internal DMA between M68000, SCSP and the Sound RAM
  • Graphic Memory or VRAM: In PSX 1x well, in SS 6x wells (3x VDP1 and 3x VDP2). Facilitating the use of semi-transparencies and framebuffer effects as render-to-texture in the case of PSX.
    • Speeds and bit width (theoretical):
      • PSX * Versions 1 and 2:
        • V1: 1MB VRAM (Dual-Port) 16bit 33MHz, 60ns cycles
        • V2: 1MB SGRAM 32bit 33MHz 30ns
      • SS: VRAM (SDRAM) 1,5MB Total VRAMs
        • VDP1:
          • VRAM*: 512KB 16bit 28MHz 34 ns
            * List Commands, Patterns (Textures), Gouraud Tables and CLUT.
          • FrameBuffer:
            • Back Buffer: 256KB 16-bit 28MHz 34ns
            • Front Buffer: 256KB 16-bit 28MHz 34ns
        • VDP2:
          • VRAM*: 2x 256KB 16-bit 28Mhz 34ns
            * Access both banks at the same time, divided into two parts. Being able to do up to 8 readings in a cycle, over: Patterns, Tiles, Cells, Planes, Maps, Coefficient Tables for Rotated planes…
          • Color RAM: 4KB* (inside the VDP2) Internal Bus 32-bit 28Mhz
            * Banks Palette Color or Coefficient Tables for Rotated planes
  • The main memory or RAM: PSX 1x well and SS 2x wells, same quantity: 2MB.
    • Speeds and bit width (theoretical):
      • PSX:
        • EDO DRAM 32-bit 33Mhz 60ns
      • SS:
        • High RAM: SDRAM 32bit 28Mhz 34ns
        • Low RAM: FPM DRAM 32bit 22Mhz 45ns cycles / 70ns access
  • And the sound memory, if they were the same. With 1x well per part and the same amount: 512KB.
    • But PSX could use ADPCM compression by hardware, actually having “more” memory, up to 4x times.
      • Types and speeds:
        • PSX RAM ns?
        • SS FPM DRAM, 16bit, 20MHz, 50ns cycles, 70ns access
  • CD-ROM, equal speed and related technologies. SS has a much more powerful exclusive controller and 512KB of memory for Buffer, in addition to the cache. This means that SS will load more quantity and in less time information of the disks.
    • Cache, controller and memory buffer:
      • PSX
        • 128KB cache
        • Motorola MC68HC05 (8bit single-chip CPU)
        • CD-Rom Buffer:?
      • SS
        • 64KB cache
        • SH1 32bit, 20MHz 20MIPS
        • CD-Rom Buffer: 512KB FPM DRAM, 16bit, 20MHz, 50ns cycles, 80ns access
  • And to auction. The buses:
    • PSX 1x Bus for:
      • CPU + RAM + (GPU + VRAM) + (SPU + SoundRAM) + Block CD-Rom + Etc…
        • 7x (6x real) DMA channels, active 1x: CPU, MDEC, GPU, SPU, PIO and memories.
      • Speeds and bit width (theoretical):
        • 32bits 33Mhz 136MB/s
    • SS 3x Buses for:
      • SCU in the middle of the 3 buses.
      • 5x DMA channels:
        • In SCU up to 2x active at the same time:
          • 2x On elements (processors and memories) of the BUS-A, BUS-B, BUS-C.
          • 1x own internal SCU for the DSP and HighRAM.
        • 2x SH2 internal DMA on BUS-C with WorkRAM (H + L)
      • Bus-A: Cartridge + CD-Rom Block (CD-Rom + SH1 + CD-rom Cache + ROM)
      • Bus-B: (VDP1 + VRAM VDP1 + 2x FRAME-BUFFER) + (VDP2 + VRAM VDP2 + CRAM VDP2) + Sound Block (M68000 + (SCSP + DSP) + SoundRAM)
      • Bus-C (CPU): CPUs + RAM (Hi / Lo) + ROM + SMPC (manager of commands or inputs)
      • Speeds and bit width (theoretical):
        • SCU: 32bit 28Mhz 114MB/s
        • DSP: 32bit 14Mhz 57MB/s
        • Bus A: 16 / 32bits 28MHz 57-114MB/s
        • Bus B: 16 / 32bits 28MHz 57-114MB/s
        • Bus C (CPU): 16 / 32bits 28MHz 57-114MB/s

In summary, as we can see the company was getting even more uphill.

Bottleneck 2: Partial use of the total available power

In the ports, at this moment the hard moment arrived. Value the resources needed (and those that were available) and in the face of such a difference, decide what was left and what was not. The reduction of color or resolution on screen or textures, was a measure to be able to get the amount of similar textures in PSX versions and achieve the smoothness or frame rate of it.

The way to distribute the memory and the calculation power of the SS, did not help to match the characteristics to the PSX. And the programmers tended to recharge the calculation, the processes and the flow on their Processors and the SS buses. Let’s say that in order not to redo everything and to achieve the same in SS at the minimum cost of time, we tended to this malpractice. Using at the end less processing capacity or memory than actually available. In some cases half, or less, of total available in SS.

The issue was in knowing or wanting to make the most of the computing power offered by SS. Because in raw numbers SS was more powerful. Although the problem or challenge was to put everything in tune to get it. A majority of games used the 2x SH2 (267 games analyzed, 142 make clear use of slave SH2 for 3D process or decompression of FMV, which is 56.4% of the total analyzed, which we could extrapolate to the catalog total.) , “Few” efficiently.

Few companies used the SS SCU-DSP, which was very fast, and he could only match the PSX GTE in MAC calculation, essential in this stage of the pipeline. If we add the 2x SH2 in MAC calculation and DIV for the calculation of final projection. We would be even above the GTE theoretically.

And finally, if in theory, in some way, to use 3D to calculate the M68000 and SH1 even more. Of these last possibilities no game or case is known, but in theory it would be possible at least with the M68000 that is on the same bus as the SH2 and SCU. Using the SH-1 might not be possible while on a different bus, although it was quite powerful.

Bottleneck 3: VDP1 Redraw

We also have to take into account the difference when filling in the pixels between PSX and SS. And which also had to be taken into account to optimize the games. The problems derived from this difference were that the SS, “wasted” pixel stuffing.

Well if on a quad with texture or plane (polygon called in the SS), which in essence both are distorted sprites. When the quad is more deformed vertically, is when the SS began to waste paint cycles. In addition to generating errors of repainting in semi-transparency effects or even in the same textures, causing a flikering, already in itself normal (PSX or PC era) without mip-mapping, but with a little more noise in the areas more deformed. In this video by John Burton it is perfectly explained.

Yet there were “advantages” derived for SS, which for PSX was a “disadvantage”. This way of drawing towards that the deformation by perspective was quite inferior to PSX. Anyway, there were solutions via hardware or software to mitigate this big problem of the VDP1:

  • Make a clipping as efficient as possible. Problem and common techniques with PSX, implemented in the hardware in both:
  • Clipping 2D options used in SS: Take advantage of the ability to not write in zones to save drawing time, implemented for both the VDP1 and VDP2. Clipping System, Clipping user and/or Clipping windows. By hardware in the case of the SS. In PSX I imagine that too.
  • Clipping 3D options used in SS: Programmed in both machines. Culling frustrum, Near / Far clipping, PVS… etc…
  • Hardware techniques specific to SS, to compensate for the waste of drawing by its way of drawing 3D elements (distorted):
    • High Speed Shrink: Calculation saving, estimating jumps over even or odd pixels within a line that has been reduced. Only for distorted elements such as: Scaled Sprites and Distorted Sprites. I have found many games that do not use it. The one that most surprised me the very Tomb Raider * also: Loaded! *, Blast Chamber * …
      * Perhaps in the cases that it is not used it is because it misaligned texels of textures. And according to the developer, he preferred to optimize the VDP1 overdraw in other ways. To lose rasterized quality.
    • Pre-Clipping: Help the VDP1 to know where the primitive is to draw it faster.
    • End codes: Saving of drawing of points in a line or complete lines of a pattern or texture.
  • Mip-mapping via software. In other words, use a texture with less resolution for when it is farther away. Implemented in both machines by software. Seen more in PSX (also by software but integrated in their SDK) than in SS. The downside was that it takes up more VRAM or CPU. But it reduces the bottleneck in the drawing on both machines.

But the biggest disadvantage is that the SS when putting on screen, without optimizing, the same amount of information (geometry, textures…) that PSX. The SS could not maintain the necessary fill rate. Without going into very complicated calculations of numbers.

Normally and currently we talk about Texelrate for 3D graphics, which in these machines and type of technology is not applicable from my point of view. We could talk about “cycles”, but it is a dead end also in these machines. His VDP1/GPU drew 3D by brute force. Drawing pixels as quickly as possible, to draw sprites/patterns. Superimposing these sprites/patterns and after all applying effects on the sprites/patterns or on these same superimposition’s: shading Gouraud or Semi-Transparency. One with sprites/patterns with square and triangular shapes.

Of course, in the case of the SS forcing the traditional drawing of patterns / sprites and in the case of the PSX drawing with an optimal criterion. And here is one of the keys, of the difference of effective final yield and variant between the SS and the PSX.

But in the rest, as in the final cost of drawing the primitives and their effects, it is very similar as we shall see.

Finally, not being able to specify the exact “cycles”, the approximation that I have made is the “cost” that has in each console the drawing of each pattern / sprite according to the type of primitive or effect. In order to extrapolate the final amount of filling of theoretical pixels in each case and compare them to obtain some kind of conclusion related minimally. Making the equivalence in the final cost by drawing a complete QUAD to 16bit of color in both machines:

SS vs PSX – Equivalent Numbers QUADs & TRIs 16bit BPP_20181014_1.ods

My numbers are based on the maximum pixel fill data of each machine, and on the most accurate data I have found in the SDKs, documentation or official tutorials of each machine about the “cost” in each type of primitive or process. I have been very conservative with the data and I have avoided selling interpretations.

SS VDP1: 28 Mpixels/s max. Theoretical. Primitive polygon = Quad

According to a formula that can be found in tutorial made by several engineers for SoA / SoE:
Cycles = k + (x * y * l) + (y * n)

x / y = 8×8 in tutorial. Height / Width in pixels of the pattern / texture

k = 70; This value is used for the tutorial for Texturing, which can be for the 16 cycles to look for a table of command in the VRAM of the VDP1 and another 16 to process it. There are still 38 cycles left, which could be the ones needed to execute the command table itself in the VDP1.

k = 70 (previous) + 232 = 302; These values ​​are used in the tutorial for Texturized + Gouraud. The 232 cycles could be the cycles necessary to apply the Gouraud table, on the flat pattern or polygon. And the 302 total cycles of processing the texturing and applying the Gouraud table.

l = 3; This value is used in the tutorial for Texturing and shading Gouraud. For a redraw penalty? Do you have access to read VDP2 color or another memory? Read VDP1 data (color, pattern or Gouraud)? Fixed or Variable Value?

n = 5; This value is used in tutorial for Texturing and shading Gouraud. Penalty for redrawing and line change? Fixed or Variable Value?

Why are these “k”, “l” and “n” values ​​used exactly? I do not know.

Comrade Urian raises a solid interpretation and another partner in the forum of Beyond3d raises another different but with another approach. Resulting one with more penalty than another. As I say, to this day a mystery.

With this formula, I have established the cost (unit) for the VDP1 to process each type of basic primitive:

Sprite cost = 1 / 0.75 *
* This penalty is not reflected anywhere. But I understand that if both the VDP2 and the GPU scalar with reduction has a cost, for the VDP1 too, even if it has an exclusive primitive for it.
Flat polygon cost. 8×1 pattern of one color = 1
Textured polygon cost = 0.77
Polygon cost with Gouraud = 0.43
Polygon cost with Semi-transparency = 0.34-0.17*

* Between 3 and 6 times more. My theory is that it could be 3 times at most if there is no distortion in the quad. In other words, it has not “redrawn”. Which is what would repeat the process several times in the same pixels added to the H-T itself. Without the “redraw”, I just have to read the background pixel, add it with the processed above and the result divide it by two. If I had “redrawed” it would happen as it says the manual of the official VDP1 of Sega that could cost 6 times more.

But how much and why have I quantified the cost for the redraw?

Well I have chosen to present a case where a perspective quad has a leak of more than 45º with respect to each axis. If it does not leak, it means that it is orthogonal to the plane of the camera. Therefore it will have redraw = 0. But in 3D elements, this case will be minimal. I have looked for the worst possible case, to look for a hard scenario for the VDP1. That’s why I have estimated that a quad with a perspective of more than 45º in each of the axes, will be able to cause up to ⅔ of redrawn pixels / texels for that quad, with / without texture or with any “color calculation of the VDP1”. What would be up to three times more for “a single pixel”. A big penalty, and maybe too conservative.

Redraw penalty up to = 0,3

Finally we have hardware tricks that help make the texturing stage faster. In the case of the SS as we said earlier the HSS. It is not a panacea, and it has a low cost of rasterized quality.

HSS ON = 1,4

Applying these unit values ​​on the total filling rate we obtain min / max.

Where the min is the values ​​when the SS is in the worst possible scenario, which is with a penalty for Overdraft and Semi-transparency up to 6 times.

And where the max it is without penalty by Redrawn and with a Semi-Transparency penalty up to 3 times.

Sprite = 28/21
Flat Polygon No-Text = 28/28
Flat Polygon + Texture = 21 / 30,18
Flat Polygon No-Text + Transparency = 1,43 / 4,76
Flat Polygon + Texture + Transparency = 1,10 / 5,13
Polygon + Gouraud = 3,61 / 12.04
Polygon + Gouraud + Texture = 2,78 / 12,98
Flat Polygon No-Text + Gouraud + Transparency = 0,61 / 2,05
Polygon + Gouraud + Texture + Transparency = 0,47 / 2,21

Now we go with Playstation:
PSX GPU: 33 MPixels / s max. Theoretical. Primitive polygon = Triangle = 2x Triangles = Quad

In a presentation of the SCEA Conference in March 1996 named “Advanced GPU” we can see:

Gross Numbers Ideal – Pixel Writing
33,8688 MegaPixels / Second
67,7376 MegaPixels / Second Sprite and Flat Polygons when they are not scaled or rotated.

Theoretical drawing speed when it is not scaled or rotated:

F3, F4, Sprites NOT scaled and / or rotated = 2 pixels / cycle = 66 megapixels / second
G3, GT3 in the cache* = 1 pixel / cycle = 33 megapixels / second
G3abr / GT3abr* = 0.5 pixel / cycle = 16 megapixels / second

Legend: 3 = Triangle; 4 = Quad; F = Flat shading; G = Gouraud shading; abr = Alpha Blending Ratio

* These data are confusing:
It is understood that doing Transparency is twice as long. That’s why half. But it does not make clear if the GT3 is using the texture cache or not. Therefore we can match the performance of a Gouraud polygon without texture to one with texture. Doing Gouraud has a cost, and texturing too.
It equates the performance of drawing triangle and quad, when it is clear in Documentation and Tutorials that the PSX makes two triangles for a quad. And do it at the same time?

What is clear is that the cache helps, but if it is not penalized.

That is why I add and estimate these following data through the rest of the evidence found in the tutorial and documentation, which we will see later:

F3, F4, Sprites Scaled and/or rotated = 1 pixel / cycle
FT3 without cache = 0,8 pixel/cycle
G3 / GT3 without cache = 0,5 – 0,3 pixel/cycle
G3abr / GT3abr without cache = 0,25 – 0,15pixel/cycle

In the documentation of the “Profiler Analizer” it is said that:

  • It takes 256 cycles max per frame or per polygon, which is not clear to me. Although later it speaks of between 200 and 300 cycles for a polygon, the concrete phrase:
    • When 100,000 polygons per second will be displayed, for example, the length of a loop to process a polygon will be approximately 200 to 300 cycles.
    • Which coincides, in part, with the SS cycles. I think it’s good data.
  • Making a transparent pixel costs up to 3 times more. **
    ** For transparency we would have two values, this (3 times) and the previous one (2 times) in the tutorial. Both seem to me two real and possible scenarios. Maybe extreme cases. According to the area, type of primitive or type of color.

In the documentation “Run-Time Library Overview 4.4” in the section “Primitive Rendering Speed” you can see:

After this table we can find some simple calculations about the premise of number of readings and / or writings to draw a pattern of 100×100 pixels.

But all these data seem confusing and inaccurate. Because it does not establish a relationship between the cost of texturing and adding Gouraud Shading. Equating the real cost, of doing only the task of texturing and then adding Gouraud. To texturize alone, or make shaded Gouraud alone. When in the tutorial it makes clear that it is another process within the pipeline. It is as if I assumed that it was “instantaneous” to do the Gouraud when you have previously textured. Which does not make sense.

I have assumed that there is a cost, and I have tried to quantify it rationally. Leaving the cost in between a flat polygon and a polygon with Gouraud.

Finally there is another curious thing, and that for me is inadmissible. When you put * in the previous table. It assumes that the PSX GPU does not care if it fills a triangle with a quad. When in the tutorial and documentation, they make it clear that when the GPU makes a quad, it is filling in two triangles. That is, assume that making a triangle equals making two. This is nonsense. With rational logic I have assumed that it is twice as much work.

And the final numbers give me the logical reason, establishing a difference or equivalence with the reality between the VDP1 and GPU, in the games that were equal and were well programmed for each machine.

With this data, I have established the cost (unit) for the GPU to process each type of basic primitive:
Sprite cost = 2 / 0,75
Cost polygon4 flat = 0,5
Textured polygon cost4 = 0,4
Cost polygon4 with Gouraud = 0,25
Cost polygon4 with Semi-transparency = 0,25 / 0,17

Finally we have hardware tricks that help make the texturing stage faster. In the case of the PSX, it is the Textures Cache for 16×16 to 16bit color textures.

Texture Cache ON = 2

Applying these unit values ​​on the total filling rate we obtain min / max.
Where the min are the values ​​when the PS is in the worst possible scenario, which is with a penalty of Scaling / Rotation and Semi-transparency up to 3 times.
And where the max with penalty by Scaling / Rotated, use of the Cache of Textures in elements with Pattern / Texture and with penalty by Semi-transparency up to 2 times.

Sprite = 66 / 24,75
Polygon4 Plane No-Text = 66/66
Polygon4 Plane + Texture = 13,20 / 26,40
Polygon4 Non-Text Plane + Transparency = 5,61 / 8,25
Polygon4 Plane + Texture + Transparency = 2,24 / 6,60
Polygon4 + Gouraud = 8,25 / 8,25
Polygon4 + Gouraud + Texture = 3,30 / 6,60
Polygon4 Flat No-Text + Gouraud + Transparency = 1,40 / 2,06
Polygon4 + Gouraud + Texture + Transparency = 0,58 / 1,65

The conclusions that I draw from these numbers are the following average data:

  1. The difference between only the SS-VDP1 versus the PSX-GPU drawing quads results in the PSX being 37% faster than SS. Not twice as it was said at the time, and I believed. And also taking into account a global penalty for redrawing that I have introduced for VDP1, which is conservative with respect to a real case.
  2. If we add the SS-VDP2 to the calculation. In this case the roles are reversed, SS is 34% faster than PSX globally. The VDP2 here marks a lot the difference in the contribution 2D (funds and UI) and “3D” for infinite planes rotated. The raw process of the VDP2 is key in all its processes without any penalty, if everything is programmed correctly. And it still affects the penalty introduced by redrawing in the VDP1 part.
  3. If we use the combination VDP1 + VDP2 to make transparencies to the elements of the VDP1 we have a clear improvement, lowering the difference to 20% even with the penalty introduced by redrawing.
    • When using the transparencies of the VDP2 for elements of the VDP1, the total yield that we obtained from adding the VDP1 + VDP2 decreases to 32%. When using the H-T of the VDP1 with redraw penalty combined with the transparency of the VDP2.
    • With all the final performance making total transparency in cases of not redrawing, totally possible, it is clearly superior to PSX.
  4. If we compare the equivalence in 3D drawing of triangles, the numbers are clear. PSX between 87% and 77% faster than SS, almost quadruple. Considering the conservative global penalty for redrawing that I have introduced.
  5. If we propose a scenario where the SS does not have the problem of redrawing, such as a totally 2D game taking advantage of the SS. The SS is faster drawing quads. Resulting the SS 20% faster than the PSX. Adding to VDP2, analogically to the previous real hypothesis, the numbers would still grow more. The SS is 57% faster than the PSX overall, a little more than twice as fast.

Other bottlenecks: DMAs and SCSP

We are talking about stability in the graphics, but here the management of the BUSes and the sound block also play their role, and essential.

A bad management of the BUS-B can be disastrous for the framerate. Causing crashes when the VDP1 or the CPU does not have any real process problems. Or even influencing the response to the command or finally the reproduction of sound. Each one of the accesses to the different destinations must be synchronized and optimized, more when the clock speeds are different or the word size limit accesses are also different.

The documentation details potential bottlenecks including the limitation of DMA use on the SCSP and its internal DMA. Since both work at different speeds with respect to the global system. Therefore, you have to be very careful when accessing the sound block.

In this sense, problems were known with the first drivers of the SCSP, which did not help to control these possible problems.

Also, a bad optimization of the code on the use of the block sound like abuse of big files PCM by tendency to use it like in the SPU of PSX, totally different. They can result in too much traffic on the BUS-B and the sound block.

Combination of all bottlenecks:

Until recently I assumed that the main cause of most cases I knew was the VDP1. But after some conversations with XL2 through private development on Homebrew Sonic Z-treme. And after having re-analyzed critical games like Tomb Raider again. I have been able to elaborate my conclusions more.

Finally, we have the concrete causes of possible bottlenecks to control:

  1. Slowdowns by VDP1 Overdraw.
  2. Due to low optimization of the transformation matrix code and/or 3D lighting.
  3. Due to poor flow management CPU <> VDP1

Definitive games, especially during 1995 and 1997, with PSX and SS version identical (or almost) in geometry, lighting, shading, color and textures. We can meet with:

  1. Falls in half in SS, in peaks of high geometry on the screen where the PSX presents a clear drop.
    • Tomb Raider → 30/24 PSX vs 30/15 SS ← For possible low optimization of the transformation matrix code and/or 3D lighting and/or due to incorrect flow management CPU <> VDP1
    • Hardcore 4×4 → 30/20 PSX vs. 30/15 SS
  2. Or cases where the PSX remains soft with some slight drop, the SS can have falls from ⅓ = 5 to 10FPS.
    • Wipeout 1 → 30 PSX vs 20 SS
    • Wipeout 2 → 30 PSX vs 20 SS
    • Loaded! → 30/25 PSX vs 30/20 SS
  3. Exceptions where the SS is slightly higher or higher than PSX:

Finally an example that in general, for me, is curious. This is the case of Hardcore 4×4. Since it moves a very high number of polygons with illumination and Gouraud shading, reaching peaks of 1500 quads, some 3.000 triangles equivalent to PSX, and equal textures. Moving to 30FPS in the top game. But with drops of 30FPS to 10FPS when all the cars come out on the screen. Noticing much the descent and harming the response in the control and affecting the gameplay. In most 3D elements of the menu goes to 60 FPS with slight drops.

Hardcore 4×4 on PSX also has drops in both the game and menus, but not so abrupt, staying longer at 30FPS and down to 20 FPS very punctually. With all cars it is kept at a minimum of 25FPS except for some exceptions.

In Hardcore 4×4 we may be faced with the case of a total combination of problems: Poor optimization of the Transformation and Lighting stage, poor CPU flow management <> BUS-B and Redrawing of the VDP1.

What happens when the FPS go down a lot (a lottttt)?

Most likely, the cause may be the vertical synchronization that you are using. The hardware native Vsync of SS is Fixed or Variable Blocked Double Buffer. Where we could have two use scenarios in the case of the SS, it was also implemented in a similar form in the PSX but by software:

  1. Double Buffer Without interlacing:
    In this case, we are halving the possibility of using resolution (vertical and/or horizontal) but we have the maximum color depth and the color calculation and shading functions of the VDP1 are available.
  2. Double Buffer With interlacing:
    The advantage of this buffer is that you can have more resolution (vertical and/or horizontal) with less depth of color. The problem is to maintain the optimum pixel fill. In the case of the SS, having the Frame Buffer split in two, we would be in this case when we use the Hi-Res mode or interlaced at maximum resolution. But we could not maintain the color depth at 16BPP, it would drop to 8BPP of palette color, in this case from the palette of VDP2. Losing the use of the color calculation and shading functions of the VDP1.

In both cases we have a full backup buffer for when the VDP1 or the CPU can not maintain the optimal drawing rate, to be able to use this buffer to store 1 Frame or extra field. Or even to use the sweep interval V or H to make some special effect or process with the system in general.

V-Sync Double Buffer Unlocked * vs Double Buffer Blocked.

  1. V-Sync Double Buffer Unlocked *:
    Having FPS unlocked allows Frames to be skipped more easily when the system is not able to maintain a constant Frame Rate. The downside is that the variations can be great. And also with the main problem, is that the possibility of vertical desynchronization and Tearing are increased.
  2. V-Sync Blocked Double Buffer:
    Block the V-sync, allows better adjust the vertical sweep and eliminate the Tearing. The disadvantage is that the program must be very well optimized to not produce high stops in any of the key places of the Pipeline. In addition, the possibility of skipping some frames in this case is complicated, although it does not make it impossible.

* Unlocked = Frame skipping … that is, when you are not able to fill the 50 / 60hz to go “to the same” speed, you have to stop drawing frames and repeat them x times. Or leave half-drawn frames and jump to the next.

V-sync Fixed or Dynamic Blocked Double Buffer:

  1. V-sync Fixed Blocked Double Buffer:
    Setting the block to a specific FPS amount.
    As a drawback we will have that when the program is not able to draw in time the amount of FPS fixed, it will suffer a slowdown, in the form of slowmotion. To adjust the required fixed FPS with which the system is able to draw at that moment. The same FPS will be drawn but no more jumps. That’s why that feeling of slowmotion.
  2. V-sync Dynamic Locked Double Buffer:
    We can have set certain FPS jumps, which change according to the drawing time or determined by some other variable.
    These jumps are usually 1/k, 2/k, 3/k, 4/k… Where k are the Hz of the video system: PAL = 50 / NTSC = 60. Resulting in the ms that it should take to draw a 1 frame or field, to maintain a total amount of target FPS. Being following the previous analogy for PAL / NTSC: 50/60, 25/30, 20/17, 12/15, 10/12…
    As an advantage we can avoid slow motion, but we will not keep the FPS the same all the time. Changing the feeling of speed of the game. But maintaining a more homogeneous smoothness. In addition, if this is not taken into account, there may also be a change in response in the controls, which may worsen the response time and the gameplay in a game where they are critical.
    But the BIG glitch is that with small decreases in frames, this tenderer quickly goes down to the next lower interval. Giving the feeling of more FPS down than it really is. Because the descent will always be half of the current one. To avoid Tearing and to be able to adjust the total FPS to the necessary end of the video system.
    These kinds of problems affect both PSX to SS equally. It is true that when a game with Fixed Vsync, has slight drops in PSX these can go unnoticed. The same game, without optimizing, in SS will result in an even clearer lowering of FPS.
    If the Vsync is dynamic, what will happen is that, where the PSX remains above or in the FPS of a higher interval, it will remain. If the same game, without optimizing, in SS will tend to go down to the next lower interval more frequently and therefore clear.

Conclusion about the V-sync:

A) The ideal implementation would be a V-sync Dual Adaptive Buffer (Semi-unlocked*):

Ideally with the advantages of Vsync DB Blocked mainly that give flexibility to the system’s graphical pipeline add the option of unlock for small ranges of frames falls, so as not to jump to the next lower interval. That is half of the current one. At the cost of generating some tearing at those times.

B) And compensate for the FPS mismatches in the final control response. Without having higher latency or not pass a threshold controlled and established. Ideally between 16 and 32ms.

Real practical examples of performance:

Before moving on to the examples I want to expose, I want to enter the VDP2.

During the previous analysis in this point I have spoken extensively about the VDP1, since it is the part that draws the 3D in SS. But when we want to see exactly the comparative performance between games, we have to keep in mind that backgrounds (2D or 3D) or the interface or HUD in SS are drawn (ideally) with the VDP2. And this was a processor even more 2D than what was already the VDP1.

I will explain the criterion that I will have for the calculation of equivalent textured quads for VDP2 funds. We can use two criteria:

1. As I did in 3D games, 2D elements the PSX. Or equivalent to VDP2 as the 3D plane used in soils:

  • Funds 4×4 = 16 quads text = 32 triangles text
  • Floors (can be illuminated): 8×8 = 64 quads text = 128 triangles text
    • Some games make floors at 20×20, not illuminated = 400 quads = 800 text triangles

2. As I did in 2D games with “2D” tiles the PSX for backgrounds, layers or elements:

  • 2D game background: 20×20 = 400 quads text = 800 triangles text

I opt for the 2nd option. Well actually the VDP2 drawn tiles (cells, Characters or Patterns) of high density of 8×8 or 16×16 to make these planes. Since the first one, it would not give us the detail that the VDP2 really gives us. And in the case of the 3D plane of the VDP2, the PSX will never get the perspective correction quality that the VDP2 achieves.

Practical example of PSX:

Let’s start with PSX and with the actual geometry in a Motor Toon 2 scene/frame from an example of the PSX Profiler Analizer manual from Polyphony Digital:

Triangles: 30 Flat + 41 FlatText + 100 GsFlat + 63 GsText = 234 triangles

Quads: 112 F + 240 FT + 337 GsF + 5 GsT = 694 quads = 1.388 triangles

Other: 12 Lines + 100 Sprites = 100 quads

Flat polygons = 254 triangles

Polygons with Texture = 521 triangles

Polygons Plano + Gouraud = 774 triangles

Polygons with Texture + Gouraud = 73 triangles

Totals drawn and calculated: 1.834 triangles per frame → 917 quads per frame

Totals transformed, illuminated, drawn and not drawn but calculated:

1.731×1.5 * = 2.751 triangles per frame → 1.375 quads per frame

Total at 30 FPS = 2.596 x 30 = 82.530 triangles / sec → 41.265 quads / sec

* 1,5 = Factor to add the calculated polygons (transformation and lighting) but not drawn.

Practical example of SS:

Sonic R goes to 30FPS fully stable: 1300x30x1.5 = 58,500 quads / second.

What few do not? But they are quads, not triangles. When talking about polygons in PSX, we talk about triangles. Multiply by 2 again: 117.000 triangles / second. Which is not bad at all. Since Sonic R, it had textured quads and with Gouraud more dynamic lighting and with 16bit color light source.

Sonic R handles more specifically these data per frame (rounded data) in the drawing process in VDP1:

Distorted Sprite Plane (Cast Stage and Characters) = 50 quads

Distorted Sprite with Texture (Cast Stage and Characters) = 150 quads

Distorted Sprite Plane + Gouraud = 500 quads

a) Light source Color (Characters) = 165 quads

b) Precalculated light (Scenario) = 335 quads

Distorted Sprite with Texture + Gouraud = 500 quads

a) Light source Color (Characters) = 165 quads

b) Precalculated light (Scenario) = 335 quads

Scaled Sprite = 100 quads

Real totals, drawn by the VDP1:

1 player: 1.266 quads max @ 30 = 37.980 quads / sec

Split screen for 2 players: 1.374 quads max @ 30 = 41.220 quads / sec.

Without counting, the SS drew an “infinite” 3D plane for the ground, background and UI with the planes of the VDP2. That in PSX they were polygons. And in equivalence to the SS would be even more “polygons” on screen, even more in the case of infinite planes. We could be talking about the following numbers:

1) These are the top numbers of each plane of the VDP2 in use in Sonic R:

  • RBG0: 3D flat ground “infinite” = 8×8 cell / Pattern Size (1 word) / Tile (2H x 2V = 64×64 cells) / Plane size (2H x 2V = 128×128 Cells) / Map with 16 (4×4) planes = 512×512 cells) = 262.144 quads text 8×8 or 131.072 quads text 16×16
  • NBG0: Background = 8×8 cell / Pattern Size (2 word) / Tile (2H x 2V = 64×64 cells) / 2×1 (128×64) Plane size / Map with 4 (2×2) Planes (256×128 cells) = 32.768 quads text 8×8 or 16.384 quads text 16×16
  • NBG2: UI = 8×8 cell / Pattern Size (1 word) / Tile (1H x 1V = 64×64 cells) / 1×1 (64×64) Plane size / Map with 4 (2×2) Plans = 16.384 quads text 8×8 or 8.192 quads text 16×16

2) Now we are going to really estimate how many are in use, depending on the amount of information that is shown on the screen:

  • The “infinite” 3D flat floor can be considered to be using the RGB0 plane in its entirety. It covers to the end of the horizon and its variety and quality is clear.
  • In the background we are also facing an excellent use. Both in detail and variety as effects use. It is also being used in NBG0 perfectly.
  • In the case of the UI it is clear that it is not being used to the maximum. And NBG2 is only using ⅓ of total. Giving as final figure = 5.461 quads text 8×8 or 2.730 quads text 16×16

Total VDP2 equivalent textured quads = 300.373 quads text 8×8 or 150.186 quads text 16×16

Totals transformed, illuminated, drawn and not drawn but calculated in VDP1:

1.374×1.5 = 2.061 quads per frame

Total VDP1 + VDP2 = 2.061 + 300.373 = 302.434 quads per frame

Total for 30FPS = 302.434 x 30 = 9.073.020 quads / sec → 18.146.040 triangles / sec

As we see the figure is huge. But I would like to make clear, that this is the real information that is able to process SS as a whole. How the machine does it.

3) Being aware, that it is difficult to assimilate. I’m going to transform this data into equivalences to what would be done in PSX, getting approximate results. But never the same:

  • Flat ground 3D “infinity” = 8×8 pattern / 16×16 Tile / 32×32 Plane / 32 Maps = 1024 quads
  • Background = 8×8 pattern / 16×16 Tile / 32×16 Plane / 16 Maps = 128 quads
  • UI = 8×8 pattern / 8×8 Tile / 8×8 Plane / 8 Maps = 64 quads

Total VDP2 equivalent quads = 1.216 quads

Total transformed, illuminated, drawn and not drawn but calculated by VDP1 = 1.374×1.5 = 2.061 quads per frame.

Total VDP1 at 30 FPS = 2061 x 30 = 61,830 quads / sec = 123.660 triangles / sg

Total VDP1 + Total VDP2 = 2.061 + 1.216 = 3.277 quads per frame

Total VDP1 + VDP2 at 30FPS = 98,310 quads / sec → 196.620 triangles / sec

Finally, we are facing “more” assumable figures. And only with the part of the VDP1 at 30FPS, they show figures equal, not to say superior to PSX in an equivalent game.

Millions of polygons per second?

A brief final explanation on the title of this section. Which reads 500 / 1.000 / 1.500 / ~2.000 quads per frame. Someone wondered, but the SS and PSX did not move hundreds of thousands of polygons or millions? Answers:

First, it’s a big lie on both machines. As we have been able to verify.

And second, the total REAL account is made by multiplying the FPS by the quads on the screen in one frame. On average, the most potent SS games were on the 1,200 screen elements, according to Yabause. Not counting the quads that are not drawn but calculated, which would be 1.5 more. Over 2.000 quads in one frame. That is, one 60.000 quads/sg at 30FPS. If triangles were 120.000 triangles/sg at 30FPS. These were the maximum real figures for those machines at that time.

Final conclusion:

In summary, I have been able to verify, that except for some exception (Crypt Killer for example) the ports or multiversion games, during 1995 to 1997, they moved very similar figures of polygons and quality of textures in both machines. The amount of SPF and stability is another issue.

In exclusive games until 1998 the thing was tied in terms of polygons on screen, about 1,500 (2,000 total calculated) polys at 30FPS. Another story is already from 1999 with the PSX alone, developers managed to raise the figures to 2,000 (2,600 total calculated) per frame. Even at 2,500 (3,750 total calculated), at the cost of sacrificing FPS, resolution, texture quality, Gouraud shading or dynamic lighting. Could the SS see these figures? As I have said before, this is another story, although I think so, just like PSX itself: Optimizing, optimizing, optimizing…

In the end was this situation of disadvantage fixed?

Yes, partly, especially for the First parties and some notable Second / Third parties.

I have been able to check the FPS in many games thanks to the latest new versions of Yaba Sanshiro. In some cases the game still does not work, but clearly in the real console you can see its smoothness. I have also been able to check the number of polygons thanks to the git version of Yabause. The games that have the amount of “polygons” calculated are marked with PolyCount. The numbers and the calculation are in this online spreadsheet, which I will update as necessary:

I have left stable games of FPS that went to 20 FPS, because really the remarkable thing, for me, was to get to 25/30 or more FPS. Large and/or interesting games have been left out, such as:

  • Panzer Dragoon (1995-03) → 20 FPS: 16BPP. Without lighting. SD.
  • Daytona USA (1995-04) → 20 FPS: 16BPP. Without lighting. SD. PolyCount
  • Touge King – The Spirits (1995-11-10) → 20 FPS: 16BPP. Without lighting. SD. PolyCount
  • Wipeout (1996-02) → 20 FPS: 16BPP. Without lighting. SD. PolyCount
  • Ghen war (1995-10-26) → 20 FPS: 16BPP. Dynamic lighting SD. PolyCount
  • Titan Wars (1995-12-20) → 20 PAL FPS: 16BPP. Without lighting. SD.
  • Panzer Dragoon Zwie (1996-03) → 20 FPS: 16BPP. Without lighting. SD.
  • Congo: The Movie – The Lost City of Zinj (1996-03-07) → 20 FPS: 16BPP. Dynamic lighting SD. PolyCount
  • Andretti Racing (1996-12-20) → 20 FPS: 16BPP. Without lighting. SD. PolyCount
  • MechWarrior 2: 31st Century Combat (1997-04-01) → 20 FPS: 16BPP. Dynamic lighting SD. PolyCount
  • Wipeout 2097 (1997-09) → 20 FPS: 16BPP. Without lighting. SD. PolyCount
  • Nascar 98 (1997-11-13) → 20 FPS: 16BPP. Without lighting. SD. PolyCount
  • Panzer Dragoon Saga (1998-01) → 20 FPS: 16BPP. Dynamic lighting SD. PolyCount
  • Burning Rangers (1998-02) → 20 FPS: 16BPP. Dynamic lighting SD. PolyCount

Now we go with the list of the remarkable ones (some more than others):

First parties:

  • 50/60 FPS:
    • Clockwork Knight: Pepperouchau’s Adventure (1994-12) → 60 FPS: 16BPP. SD Clockwork Knight 2 (1995-07) → 60 FPS: 16BPP. SD
    • Virtua Fighter 2 (1995-12-01) → 60FPS: FullHD. 8BPP. Without lighting. PolyCount
    • DecAthlete (1996-07-12) → 60 FPS: FullHD. 8BPP. Without lighting.
    • Virtua Fighter Kids (1996-07-26) → 60 FPS: FullHD. 8BPP. Without lighting.
    • Fighting Vipers (1996-08) → 60 FPS: Dynamic lighting. 16BPP. SemiHD.
    • Fighters Megamix (1996-12) → 60 FPS: Dynamic lighting. 16BPP SemiHD. PolyCount
    • Last Bronx (1997-08) → 60 FPS: Without Lighting. 8BPP FullHD. PolyCount
    • Digital Dance Mix Vol.1 Namie Amuro (1997) → 60FPS: FullHD. 8BPP. Without lighting.
    • All-Japan Pro Wrestling featuring Virtua (1997-10) → 60 FPS: Dynamic lighting. 16BPP SemiHD.
  • 25/30 FPS:
    • Virtua Fighter 1 (1994-11) → 30 FPS: Flat dynamic lighting. 8BPP. FullHD. PolyCount
    • Virtua Fighter Remix (1995-11) → 30 FPS: 8BPP. FullHD. PolyCount
    • Sega Rally (1995-12-29) → 30 FPS: Without lighting. 16BPP. SD. PolyCount
    • Virtua Cop 1 (1995-11-24) → 30 FPS: Without illumination. 16BPP. SD.
    • NiGHTS into Dreams (1996-07-05) → 30 FPS: Dynamic lighting. 16BPP SD. PolyCount
    • Daytona USA CEE (1996-11) → 30 FPS: Without lighting. 16BPP. SD. PolyCount

Second parties:

  • 50/60 FPS:
    • Guardian Heroes (1996-01-26) → 60 FPS: 16BPP. SD.
    • Time Warner Interactive’s V.R. Virtua Racing (1995-12-18) → 60 FPS: With falls at 25 point FPS. Flat dynamic lighting. 16BPP. SD. PolyCount
    • Silhouette Mirage (1997-09-11) → 60 FPS: 16BPP. SD.
  • 25/30 FPS:
    • Wing Arms (1995-09) → 30 FPS: Dynamic lighting. 16BPP SD. PolyCount
    • Ghen war (1995-10-26) → 30/40 FPS: Pause in sight mode. Dynamic lighting 16BPP. SD. PolyCount
    • Manx TT (1997-03) → 30 FPS: Without illumination. 16BPP. SD. PolyCount
    • The Lost World Jurassic Park (1997-10-07) → 30 FPS: Dynamic lighting. 16BPP. SD. PolyCount
    • Duke Nukem 3D (1997-10-29) → 30 FPS: With falls at 20 point FPS. PolyCount
    • Sonic R (1997-11-21) → 30 FPS: Dynamic lighting. 16BPP. SD. PolyCount
    • Sega Touring Car Championship (1997-11-24) → 30 FPS: In Menus and Classification. In career it fluctuates a lot between 20FPS and 30FPS. Without lighting. 16BPP. SD.
    • Quake (1997-12-03) → 30 FPS: Falls to 15 struts. Dynamic lighting 16BPP SD. PolyCount
    • Shining Force III (1997-12-11) → 30FPS: On Map, punctual falls at 20FPS. Battle at 30FPS. PolyCount
    • Stellar Assault SS (1998-02-26) → 30 FPS (NO YabaSanshiro): Dynamic lighting. 16BPP. SD. PolyCount
    • House of the Dead (199xx) → 30 FPS: Without lighting. 16BPP. SD.

Third parties:

  • 50/60 FPS:
    • Road Rash (1996-07-26) → 60 FPS: With falls at 25 FPS and very punctual at 20 FPS. It seems that due to the problem of VDP1 of excessive redrawing in large “quads to tris”. Without lighting. 16BPP. SD. PolyCount
    • Skeleton Warriors (1996) → 60 FPS: Without lighting. 16BPP. SD.
    • Street Racer Extra (1996) → 60 FPS: Without lighting. 16BPP. SD. PolyCount
    • Striker ’96 (1996-05) → 60 FPS: Without lighting. 16BPP. SD.
    • Hardcore 4×4 (1996-12) → 60 FPS: In 3D menus with drops at 30FPS
    • Tempest 2000 (1996) → 60 FPS: During the game. Render-to-texture effects in menus at 10FPS. Without lighting. 16BPP. SD.
    • Dead or Alive (1997) → 60 FPS: Without Lighting. 8BPP. FullHD. PolyCount
    • Goiken Muyou: Anarchy in the Nippon (1997) → 60 FPS: Without Lighting. 8BPP. FullHD.
    • Mass Destruction (1997) → 60 FPS: Without illumination. 16BPP. SD. PolyCount
    • Terra Cresta 3D (1997) → 60 FPS: Lighting? 16BPP. SD.
    • Shinrei Jusatsushi Taromaru (1997) → 60 FPS: Lighting? 16BPP. SD.
    • Thunder Force V (1997) → 60 FPS: Falls to 35 / 40FPS in points.16BPP. SD.
    • Layer Section II (1997) → 60 FPS: Lighting? 16BPP. SD.
    • Zero Divide: The Final Conflict (1997) → 60 FPS: Without lighting. 16BPP. SD.
    • League Go Go Goal! (1997) → 60 FPS:
    • Jonah Lomu Rugby (1997) → 60 FPS: Without lighting. 16BPP. SD.
    • Radiant Silvergun (1998) → 60 FPS: Dynamic lighting. 16BPP. SD.
    • Savaki (1998) → 60 FPS: Dynamic lighting. 16BPP. SD.
    • Akumajo Dracula X: Gekka no Yasokyoku (aka Castlevania: Symphony of the Night) (1998) → 60 FPS: With punctual falls at 30 FPS. Possibly by the V-sync and lack of optimization. Illumination? 16BPP. SD.
  • 25/30 FPS:
    • Titan Wars (1995-12-20) → 25 FPS NTSC: Without lighting. SD.
    • Need for Speed ​​(1996-07) → 30 FPS. Without lighting. 16BPP. SD. PolyCount
    • Loaded (1996-06) → 30 FPS: With falls at 20 point FPS. Dynamic lighting 16BPP. SD. PolyCount
    • Exhumed (E) / Powerslave (U) (1996-09) → 30 FPS: With falls at 20 point FPS. Dynamic lighting 16BPP. SD. PolyCount
    • Battle Arena Toshinden URA (1966-09-27) → 30 FPS: Without Lighting. 8BPP. SemiHD. PolyCount
    • Tomb Raider (1996-10) → 30 FPS: With point falls at 15 FPS. Dynamic lighting 16BPP. SD. PolyCount
    • Judgment Force Proto / Fighting Force Rolling Demo (1996) → 30 FPS. Without lighting, like PSX. 16BPP. SD. PolyCount
    • Hardcore 4×4 (1996-12) → 30 FPS: With drops to 15FPS with all the cars on the screen and at points of the circuits with a car. Dynamic lighting 16BPP. SD. PolyCount
    • Touge King the Spirits 2 (1997-04-18) → 30 FPS: Dynamic lighting. 16BPP. SD. PolyCount
    • Resident Evil (1997-07) → 30 FPS: Dynamic lighting. 16BPP. SD. PolyCount
    • Darklight Conflict (1997-07-17) → 25 PAL FPS / 30 NPSC FPS: Dynamic color lighting, just like PSX. Dynamic lighting 16BPP. SD. PolyCount
    • Croc: Legend of the Gobbos (1997-09-17) → 30 FPS. Same geometry and textures as PSX. Precalculated lighting in scenarios, just like PSX. Lighting without a source in assets and protagonist, in PSX it is with a source. Dynamic lighting menus. 16BPP. SD. PolyCount
    • Black Dawn (1996-12-10) → 30 FPS: Point falls to 20. No lighting, just like PSX. Without lighting. 16BPP. SD. PolyCount
    • NHL 97 (1996-12) → 30 FPS: In 3D menus and changes from player to match at 60FPS. Without lighting. 16BPP. SD
    • Fighting Illusion K1 Grand Prix (1997-01-31) → 30 FPS: Dynamic color lighting, 16BPP. SemiHD. PolyCount
    • Drift King Syutokoh Battle 97 / Shutokou Battle ’97: Tsuchiya Keiichi & Bandou Masaaki (1997-02-28) → 30 FPS: Dynamic lighting. 16BPP. SD. PolyCount
    • Pandemonium! (1997-02-28) → 30 FPS (NO Yaba Sanshiro): With falls at 20 point FPS where there is a lot of VDP1 Half-Transparency area and at specific levels as the end (lighting and / or VDP1 H-T) Dynamic lighting. 16BPP. SD. PolyCount
    • Hexen (1997-03) → 30 FPS: With falls at 20 point FPS.
    • FIFA 97 (1997-04) → 30 FPS: Without lighting. 16BPP. SD. PolyCount
    • D-Xhird (1997-05) → 30 FPS: Dynamic lighting. 16BPP. SD. PolyCount
    • Bulk Slash (1997-07-11) → 30 FPS. Lighting in Briefing. Without lighting in Game. 16BPP. SD. PolyCount
    • FIFA 98 (1997-12) → 30 FPS: Without lighting. 16BPP. SD.
    • Virtual Kyōtei 2 (1997-12-04) → 30 FPS: With strong drops, 15FPS when the camera shows most of the circuit towards the bottom and rotating. Dynamic lighting 16BPP. SD.
    • World League Soccer 98 (1998-06-05) → 30 FPS: Totally stable during the match. No lighting. 8BPP. FullHD.
    • Code R (1998-07-09) → 30 FPS: Totally stable in playable parts. Only in Manga / comic type repetitions at 15FPS. Dynamic lighting 16BPP. SD. PolyCount

Finally, again I stay with three of the best games in this section. What would they be:

Judgment Force Proto / Fighting Force Rolling Demo

Judgment Force which is what I call previously Figthting Force, which was a prototype dated 1996-11-26. Take advantage of the SS very well. Both SH2 CPUs and use of the SCU is seen, reaching a 26% use of it. It makes use of the main RAM and good VRAMs, reaching 77% of the total. Of the DSP of sound makes a use of 20% using CD-DA for the music. Like detail to emphasize that like in the rest of games of Core Design, this does not use the register HSS to accelerate the drawn one of the VDP1. It could be that they had already optimized their drawing pipeline for SS, and they did not want to use HSS to avoid losing drawing quality, and they had plenty of drawing time with their code. What seems to be is that this was a development that was started for SS, as Core’s proposal to Sega for a Street of Rage 4. It seems that it could not be, and it became a multiplatform game as we knew it, staying along the way the final version of SS. All of them with the same screen resolution / textures and color depth for screen / textures. With a quantity of geometry and equal content, with slight differences according to machine. Running in the case of SS at 30FPS constant. And leaving a detail that betrays that SS was the console in mind for this game. The ground in this game is flat and extends to infinity. In the case of the SS with the VDP2 it was easy to do this, with a very high quality. In the other machines of the time it was another story. In the case of PSX, PC and N64, you can see how there is a trick that tries to imitate the VDP2 of the SS on the horizon. In PC and N64 we can find as a backing lighting by Gouraud color and PC in addition better textures. It is clear that Core Design already at the end of 1996 had by hand the SS. In the proto of 1996 you can see the use of transparencies VDP1> VDP2 for shadows, lighting or power trails. Which were lost in the version of the 1997-03-14 Rolling demo already called as Fighting Force, where only the use of “mesh” is seen for all these cases. The game was finally released in 1997-11-15, one year after the prototype. We can see that there was plenty of time to finish the version of SS and even improve it, since being a game belonging to the 3rd wave of developments for SS, this was already a time when quality is equal to PSX in most aspects. But this is another story.

Darklight Conflict

Published 1997-07-17, at the end of the 3rd wave of games for the SS, Rage takes off the thorn in Doom, both in the same year. It is true that they had the EA umbrella, which when I put a team with time and resources to work in SS achieved spectacular results. In this game Rage and EA use the SS masterfully. The 3D engine is up to PSX without any doubt: Geometry and dynamic color lighting. They even look for ways to use SS transparencies with great success, in this case type VDP1>VDP2. All at 30FPS like a rock.

Perhaps this team of Rage, the same as Doom, in a next title of this scope, could have gone even further in the issue of transparencies. With regard to the use of CPUs and coprocessors, all are masterfully used. Both SH2 and the SCU-DSP with the same DMA, reaching a level of 54.25%. With regard to the use of available main memory they used 62%. There is a lot of free space in the VRAMs, so when using the PSX content, when using a higher resolution mode, it did not fit any more textures. In addition, the SS has 0.5MB of memory for its VDP2, which is always a plus that was not always used equally between machines. With all a great use, reaching 85% of the VDP1.

In the SS version the 3D engine is at a lower resolution than PSX in the Horizontal: 352 vs 512, but the same in vertical: 256. It is the only important difference with PSX, in addition to the usual, more transparencies and something more of use of the full screen in the FMV.

At the sound level, I do not use DSP for effects such as reverb, and CD-DA is used for music. In this aspect the SS could have been used much more, with dynamic orchestrated music and use of reverb for space.

The FMV running in charge of the EA codec and its decompression libraries, which use the SS to the limit, bringing the MPEG to the console, both in compression and color quality, and audio. Using the two SH2 together with the SCU-DSP to the limit of 64%. Making use of the SCU-DSP in a similar form and performance that the MDEC of the PSX.

Croc: Legend of the Gobbos

Published on 1997-09-17, also at the end of the 3rd wave of games for SS. Argonaut programed a 3D engine designed to compete with the very same Mario 64. 3D engine that builds for PC, PSX and SS. Which maintains its main characteristics equal between platforms. Both the content of game, levels, items, music, game kinematics, is the same in all versions. Including 3D content: the amount of geometry and quality of textures, including shading main Gouraud. The only notable differences between PSX and SS, is lighting with a source in the main character and some assets during the game, with less transparency and with a little less horizontal resolution. In PSX you get to 512 while SS stays in the 352. They are also slightly different menu designs, but with the same content, and different loading screens. The 3D assets of the menus in this case with font lighting with Gouraud. Finally, highlight the very intelligent use of VDP2 for the UI, funds and flat floor elements. This last using the infinite rotated plane of the VDP2.

With respect to the sound, to emphasize of the use of Dolby Surround in SS, something very little common, and that nevertheless in PSX was more used. Proving that SS could also with the Dolby perfectly. In addition to using ADPCM sound, for sound and voices. Also, less usual in SS, but also possible. We can highlight the use of the DSP of SS sound in 50% of its memory. Although the use of the DSP for the echo or reverb in the area of the caves is missing as in PSX, possible also in SS, but that was not used.

Take advantage of the machine in a remarkable form, using the two SH2 and the SCU-DSP for graphics, the latter at 50%. To obtain a rate of 30FPS constant, matching in this type of 3D games in smoothness and stability with PSX. On the other hand it makes an average use of 77% of the total memory resources of the SS. A large number.

We can notice that Croc, like many other titles of the moment, was developed for PC / PSX simultaneously given its technical closeness to development. And that SS differs enough to conclude that it was a development apart. Details like: FMV of logos of beginning, design of identical menus and some changes of textures in the same places.

Although in this game finally and despite these differences the result is satisfactory, not perfect, but very remarkable. What a pity that Argonaut had not finished his presence in SS with the Croc 2 and would have improved these details that he lacked in this 1.

3.1 The complicated I – Ball EXTRA I: Use of the SCU-DSP

As we were saying few developers used the SCU-DSP of the SS. Not the SCU of generic form, since in the bookstores of Sega this one was used obviously. We talked about using the SCU DSP in a specific form to calculate and accelerate the games in critical areas such as: FMV, image processing/rendering, audio decompression and 3D processes such as geometric transformations or dynamic lighting. Who knows, also for physics or calculation of collisions.

Theoretically even for several tasks at once. The SCU-DSP was a processor specialized in processing small amounts of data but executing up to 6 instructions in a single cycle. The problem was that, for this purpose, the processes, the tasks, the types of data, the memory wells to be used and the DMA to be used must be designed with extreme care. As the SCU-DSP process is interspersed and in parallel to the rest of the processes of the SS: the two SH2, the accesses to the memory wells, the VDP1 and 2 graphics processors, the sound system with the M68000 CPU…

This worked exclusively with the SH2 Maestro and with the Rapid RAM Memory through an exclusive DMA channel. In many ways the SCU-DSP and the GTE are extremely similar. In essence, they were specialized DSP processors with the possibility of doing processes in parallel with basic and fast instructions. We could say that it could have been implemented in the same way at the SDK level as the GTE. That is, create pre-programmed macros, compiled in assembler, ultra-optimized and accessible with C code to further facilitate its programming. For specific tasks of vector mathematics or 3D graphics: matrix transformation, matrices lighting, scalar calculation, Dot product, Normal Clipping and Depth Queuing.

But Sega was not in this sense, but left the possible use of this part very weakened face to the most modest developers, the vast majority. I am convinced that the SCU-DSP can equal the efficiency in the calculations and the use to the GTE, if not in the 100% of records that existed in the GTE, in a great majority. In fact, it is very possible that some ports tried to do something similar, in some cases even on the same SCU-DSP, but that they reached the same level of optimization as the Sony macros for their GTE, more at that time, almost impossible or very unlikely.

Having said all this, we could apply it to the second Co-processor integrated in the PSX CPU, the MDEC. The SCU-DSP can do that task in the same way and efficiency. In fact, it is more than possible that some developers will use it in this sense, as we will see in the FMV point.

Well as we can observe a set of variables to control very important. But despite this, there were very good examples of its use, and very satisfactory.

In the end was this situation of disadvantage fixed?

DO NOT. Although there are special cases where it came to use the DSU of the SCU to help in the 3D calculation, similar to the GTE of the PSX. Here we have to take into account that of 258 games analyzed, 44 make clear use of the SCU for 3D process or decompression of FMV. That is 17.5% of the total analyzed, that we could extrapolate to the catalog total.

Also, note that using the SCU for 3D not all use it optimally, since it is not appreciated at all optimal performance, or a% of memory use or records, high.

Finally, there are still many cases where there is no certainty that the SCU is being used. That in this case is a similar percentage: 12% of the total analyzed. It could be used for DMA transfer between buses or perhaps in some cases to decompress ADPCM sound.

Of this 12% only 6 games (a 2.33% approx.) Are using the DMA of the SCU-DSP with maximum certainty, because it is necessary to activate the DMA Timing in Yabause so that they look good. This does not mean that it is not being used in other cases. Determining clearly in these cases its use, at least, for 3D transformations.

It would also be necessary to differentiate the cases where the SCU-DSP is working alone with the master SH2. Situation that could be given, according to information of the time, because it was simpler theoretically, to use only one SH2 plus the DSP of the SCU. Also, because at the beginning it was not very clear how to use the SH2 slave. Although you can see how, from the beginning, the two SH2 plus the SCU-DSP were used in the very same Virtua Fighter 1. And later on in more titles, as we have said before. Although it is true that there are several titles where this combination of 1xSH2 + SCU-DSP occurs. In my research I have only found 4 games of the total analyzed, meaning a reduced% of the total: 1.5%

That said, let’s analyze games by developer where the SCU-DSP is used with clarity:

First parties:

  • Virtua Fighter 1 (1994) → 2xSH2 + SCU-DSP Transformations and illumination of vertices
  • Virtua Figther Remix (1995) → 2xSH2 + SCU-DSP Transformations of vertices
  • Burning Rangers (1998) → 2xSH2 + SCU-DSP Possible ADX decompression and lighting
  • SEGA GAME SAMPLE 1 For SBL 1.1 → Transformations and illumination of vertices 1x SH2 + SCU-DSP DMA On ← less the Example of “Carpet”

Second parties:

  • Quake (1997) → 2xSH2 + SCU-DSP Possible drawing Sky by layers and/or lighting and/or vertex transformations
  • Shining Force III (1997) → 2xSH2 + SCU-DSP Render of Textures with gradients and/or procedural
  • Sonic R (1997-11-21) → 2xSH2 + SCU-DSP Geometric transformations.

Third parties:

  • Wipeout 1 (1996) → 2xSH2 + SCU-DSP Transformations of vertices DMA On
  • Lemmings 3D (1996) → 1x SH2 + SCU-DSP DMA On
  • Destruction Derby (1996) → 2xSH2 + SCU-DSP Transformations of vertices DMA On
  • Tokusou Kidoutai J SWAT (1996) → 1xSH2 + SCU-DSP
  • Pandemonium! (1997-02-28) → 2xSH2 + SCU-DSP Transformations and illumination of vertices
  • MechWarrior 2: 31st Century Combat (1997) → Transformations and illumination of vertices 1xSH2 + SCU-DSP
  • Touge King the Spirits 2 (1997-04-18) → 2xSH2 + SCU-DSP Possible Geometric Transformations.
  • FIFA 97 (1997) → 2xSH2 + SCU-DSP Transformations of vertices DMA On
  • Darklight Conflict (1997) → 2xSH2 + SCU-DSP Transformations and illumination of vertices DMA On
  • Assault Rigs (1997) → Transformations and illumination of vertices, 1xSH2 + SCU-DSP DMA On
  • Dead or Alive (1997-10-09) → 2xSH2 + SCU-DSP Transformations of vertices, possible DMA On.
  • Croc: Legend of the Gobbos (1997) → 2xSH2 + SCU-DSP Transformations and illumination of vertices DMA On and possible decompression ADPCM
  • Virtual Kyōtei 2 (1997-12-04) → 2xSH2 + SCU-DSP. Transformations and illumination of vertices.
  • Code R (1998) → 2xSH2 + SCU-DSP, Possible ADX decompression.

As I have been doing, I select 3 games that stand out in this point for their more extensive analysis:

Virtua Fighter 1

Being the first game released for SS, back in 1994-11-22. If almost 24 years. I was surprised that they are using the two SH2 plus the SCU-DSP, the latter with a lot of use 95% memory and 48% of records. It is true that strangely, the RBG0 is not used for the ground, if not quads, with a more obvious clipping cone. It should be noted that the conversion with respect to the Arcade suffers from a reduction in the geometry of the fighters, in details such as the fingers. Passing from 2300 polygons to 550 in the characters. The rest of the game is intact, lighting (1 Dynamic Parallel Light source), content and colors included. According to official data the game moves 39,000 quads / sg. As we say 1100 approx. for the characters and 220 for the stage. Everything moves to 30FPS as in the arcade, or according to some even better, because the arcade can go down to 15FPS in some moments. At a resolution of 640×224 NTCS in PAL 704×256 compared to the Arcade that was 496×384, it is not bad. Finally, detail the total use of the available memory of the system that reaches 60%, with the use of the VDP1 30% really low, rising to 60% in the Remix version that came out a few months later. The use of the sound DSP goes up to 60% of the memory. Possibly we are facing one of the first projects that were used to create the SGL library, widely used in the following AM titles. Finally add that the Remix version that came out a few months later, is practically the same program. But quads with textures are used instead of flat quads. The polygons are reduced by characters, without hardly appreciating. The VDP2 is used for the ground, gaining in detail and without cuts. Losing the dynamic lighting of the first.

MechWarrior 2: 31st Century Combat

Launched in the 1997-04-01 beginning of the 3rd wave of developments. Mechwarrior coming from PC, is a worthy conversion. Perhaps it is one of the few games that uses the two RGBx of the VDP2 to: draw the floor and the sky. It also has another peculiarity, it may be one of the few games used by VDP2 transparency with VDP1 elements textured with Gouraud. Has shaded Gouraud with lighting with fountain and static. Use the two SH2 for 3D and FMV, in addition the SCU-DSP for the 3D, reaching a use of 50% in Memory and 43.5% in registers. It also uses the DSP of the sound with 30% memory usage. From the rest of the total system memory 71%, reaching 90% of the VDP1. The game is displayed at 320×240 at 20FPS constant, and uses LOD for stage and asset elements. With respect to the PSX version, SS came later with a difference of less than a month. And in general terms, except the issue of transparencies, and some other detail, the games are the same. As a final curiosity, note that it has a proprietary video format, which gives a very interesting quality within what is seen in SS.

FIFA 97

Launched in the 1997-12-17 final of the 3rd wave of developments for SS, with 4 months of delay compared to the PSX version. With a questionable lack of use of transparencies, both in menus and in the 3D part. However, this is a remarkable port, the level of geometry, textures and animations is identical to that of the PC / PSX versions. The comments, music are intact, like the rest of the content of the game. The game is presented at 320×240 16bit and goes to 30FPS stable, showing more than 1000 quads on the screen. As in PSX, saving the difference that in this the resolution is slightly higher 512×240. It makes use of the RGB0 of the VDP2 for the field, giving it a quality that PSX can not, missing only the totally possible and simple horizon gradient in the VDP2. It makes use of LOD in the players and shadows, in the shadows in SS the LOD is higher, showing only 2 full shadows against the 5/6 PSX. At least these shadows are transparent (VDP1> VDP2) with respect to the field. It makes use of the remarkable SS process capability. Showing use for 3D and FMV of the two SH2 plus the SCU-DSP, the latter with a memory usage of 90% and Registers of 43.47%. In addition, we found a use of the DSP of sound of up to 35% of memory. In the SS version there is no Dolby like in the PSX version, quite possible otherwise. However, we find strong indications of ADPCM use for music, public, comments … Something unusual in the system and to be grateful. 64% of the total system memory is used, it does not reach 50% of the main RAM. Finally, highlight the immense quality of the FMV of this game. As usual, I’m finding that EA had excellent video codecs for SS, in this game we are probably among the best, in terms of quality (DCT), color (32bit), resolution (304×192) and sound (PCM Mu-Law / Stereo / 8bits / 22Khz). I just need to not use the VDP2 zoom to extend the video to the maximum, weird.

3.2 The complicated I – Ball EXTRA II: Resolution screen SD / HD

As I progressed on this journey of research and deepening the guts of the SS, I encountered issues that I had not even considered at the beginning of this post. As the resolution of 3D rendering in games. I have been finding posts in forums and opinions about it compare with PSX ports or similar games where to compare the power of both consoles.

After seeing as many, most ports or versions use the same resolution. A little more in one or the other according to port, we talk about 320×240 in PSX and 351×240 in SS in some games. Or 320×256 in PSX and 320×224 in SS. In Tomb Raider 364×240 in PSX and in SS 352×256. Little thing. Or to bigger jumps like in very similar games like: Croc, Darklight Conflict and Die Hard Trilogy (Airport level) to 512×240 in PSX and in SS 352×240.

Or comparing two similar games like Burning Rangers with Soul Reaver. Very similar in many aspects to the engine but the PSX to 512×256 and the SS to 351×224. There are quite a few points less in both X and Y. Both rendering at 16bit of color with a similar FPS rate of 25FPS in the case of PSX and 20FPS in the case of SS. It is true that they are not comparable in the sense that the PSX game is of the 4th generation and that of the 3rd generation SS. 1999 against 1998. One year gives a lot.

We can see that in most games, the textures are similar or equal. Saving the different way of texturing of each machine. But why SS does not reach these resolutions in similar games. The answer is found in the differences in memory distribution in both machines and the resulting restrictions when increasing the resolution. Also in the support, in some way, of some resolutions. In SS you can not have either 16BBP or 8BPP values ​​of X: 256, 364, 384, 512. Having Frame buffer space. In Y, if there is the same disposition of values, with different restrictions.

When you upload the resolution, you not only have to process more data, but you have to store it. And this is where the SS had another snag. When you go up from 351 in X and / or 256 in Y you already go to the Hi-Res mode of VDP1. It is compulsory to go from 16bit to 8bit rendering capacity and as a result you can no longer use color calculations like Gouraud or Semi-transparencies. But you increase the capacity to store textures with respect to 16bit mode. Well you have the same space to save the textures but to 8BPP bound, which makes it possible to put more textures even.

However, what the SS can do is play with the resolution and like the PSX. Doing interlaced: 256 in Y maximum x2, gives us 512 in total Y. Many games take advantage of the Full VDP1 Hi-Res plus the double-interlaced VDP2 Hi-res.

Although in this sense as in others, the PSX was more flexible. It can go up in resolution and hold up to 16bit of color, but it will lose its ability to store textures exponentially. That is why the games that use these modes have less textures and use more Gouraud to make up for this lack.
Thing that does not happen with SS, that raising the resolution will never lose ability to store textures, only its depth of color.

Although overall, we can say that SS lost more than it earned, because losing its shading capacity in general or the functions of color calculation the Raster VDP1, is much the way I see it. The VDP2 could also lose options in Hi-Res mode, but only in the special, which is a special mode for monitors.

In the end was this situation of disadvantage fixed?

Yes, in part. There are games in high resolution in SS. Even more than the PSX maxim. And very soft. But without Gouraud or Transparency, as an almost general rule. There is a case that makes transparency in Hi-Res VDP2, but only in the shadows, is Last Bronx: Which renders 351×511 16bits color in 3D of VDP1 and output of VDP2 to 704 x 256 with Double Interlace, which as a result it is: 704×512, for the graphics drawn by the VDP2. As we can see, it’s not bad at all. We have 351 in X of the 704 totals that if you draw the VDP2 and 512 real points in Y, both in the VDP1 and in the VDP2. And all at 60FPS fluids.

It remains the pending question of how it would have been to have dynamic lighting in one of these games in “Hi-Res” to 8BPP: Virtua Fighter Remix, Virtua Fighter 2, Virtua Fighter KIDS, Last Bronx, DOA or Battle Arena Toshinden URA. As if I had the same Virtua Fighter 1. And how they would have been with that extra lighting. Being sure more spectacular, as it was in PSX the Tekken 2 or Bloody Roar 2 that did something similar.

The causes may be due to the fact that the calculation of extra lighting, would make the objective of 60FPS more complicated. And not having Gouraud made it a bit more complicated to change the luminosity of the polygons. The sum of both, perhaps materialized in removing them from the equation and focusing on the maximum FPS and resolution.

There are many more games that use SS Hi-Res mode, most of them only from VDP2 and/or menus. Some games throughout the game or in a mixed form. There is a chronological list:

All the game VDP1 Hi-Res >> VDP2 Normal:

  • Virtua Fighter 1 (1995) → 704×256 Non-Interlaced
  • Virtua Fighter 1 Remix (1995) → 703×255 >> 704×256 Non-Interlaced
  • NBA Live 98 (1997) → 639×239 >> 640×240 Non-Interlaced
  • Battle Arena Toshinden URA (1996) → Fighting game, menus in SD. 703×255 >> 704×256 Double interlaced
  • World League Soccer 98 (1998) → 640×255 >> 640×256 Non-Interlaced

The whole game VDP1 Hi-Res 2x “Y” Resolution >> VDP2 Hi-Res:

  • Virtua Fighter 2 (1995) → 703×511 >> 704×256 Double interlaced
  • Decathlete / Athlete King (1996) → 703×479 >> 704×240 Double interlaced
  • Virtua Fighter Kids (1996) → 703×511 >> 704×256 Double interlaced
  • Vatlva Taikenban (1996) → No System Clipping >> 640×224 Double interlaced
  • Dead or Alive (1997) → 703×479 >> 704×240 Double interlaced
  • Digital Dance Mix Vol.1 Namie Amuro (1997) → 703×479 >> 704×240 Double interlaced
  • Winter Heat (1998) → 703×479 >> 704×240 Double interlaced

Mixed VDP1 Normal or Hi-Res >> VDP2 Hi-Res:

  • Fighting Vipers (1996) → 351×255 >> 704×256 UI HD
  • Megamix Fighters (1996) → 351×255 >> 704×256 UI HD
  • Fighting Illusion K1 Grand Prix (1997) → 351×223 >> 704×224 UI HD
    D-Xhird (1997) → 351×223 >> 704×224 UI HD
  • All Japan Pro Wrestling Featuring Virtua (1997) → 351×223 >> 704×224 Double interlaced UI HD
  • Elan Doree (1998) → 703×223 >> 704 x 224 Backgrounds UI Double interlaced

Mixed VDP1 Normal 2x “Y” Resolution >> VDP2 Normal or Hi-Res:

  • Last Bronx (1997) → 352×507 >> 704×256 Double interlaced, No Hi-Res VDP1, Yes Hi-Res VDP2.
  • Zero Divide: The Final Conflict (1997) → In menus. 352×479 >> 352×240 Double interlaced, No Hi-Res VDP1, No Hi-Res VDP2.

Only (some element in VDP1) VDP2 Hi-Res in menus or parts of the game:

  • Daytona USA CEE (1996) → VDP2 initial screen
  • Cyber ​​Troopers Virtual-On (1996) → VDP2 initial screen and menus
  • Black Dawn (1996) → VDP2 menus
  • Doom (1997) → VDP2 Logos and initial screen
  • Touge King the Spirits 2 (1997) → VDP2 Initial screen
  • Formula Karts (1997) → VDP2 Menus
  • Sega Touring Car (1997) → VDP2 Initial screen and menus
  • FIFA 98 (1997) → VDP2 menus
  • Panzer Dragoon Saga (1998) → VDP2 initial screen
  • Burning Rangers (1998) → VDP2 Menus
  • Akumajo Dracula X: Gekka no Yasokyoku (aka Castlevania: Symphony of the Night) (1998) → 640×240 Double Interlaced VDP2 pause menu
  • World League Soccer 98 (1998) → 640×511 >> 640×256 Double interlaced in menus
  • Code R (1998) → Menus

Finally, again I stay with three of the best games in this section. What would they be:

Virtua Fighter 2

Launched in 1995-12-01, beginning of the first wave. This game finally put the SS in place. It is noted that the AM2 team assigned to SS sought to match as much as possible the finish of their versions in Arcade. It is patent as in this title, it is still betting on the maximum possible resolution by SS (703×511), at the cost of losing 16bit color depth and remaining in 8 bit. And with this, lose being able to use the shader of the VDP1 as the Gouraud, useful to recreate the flat or soft lighting. Perhaps also the AM2 team, does not use shading because they do not want to implement the lighting by source, which they already had in VF1, animated human shadow included, passing them to be an orthogonal projection, albeit in animated human form. Maybe this response to the desire to reach 60FPS. Since the calculations of illumination, for flat or smooth faces, this last one even more complex, they are very expensive and difficult to implement in SS at this moment of development of bookstores. Losing the point of spectacularity that gives, and seeing that Arcade was flat lighting, gives the impression that a remix version of this with this detail would have been the perfect brooch. In this case they follow the line marked in VF1 Remix, and even go a step further. They reduce the amount of polygons in the characters something else. However, as a counterpart, they make more intense use of the VDP2 and its funds. Supplying the lack of 3D backgrounds with several 2D layers, with translation and zoom depending on the case. Insurmountable in the case of the level of the boat in the river. Of course using the rotated plane of the VDP2 for the tatami, giving, here, a definition at the level of the Arcade. As is obvious, the rest of the content of the Arcade remains intact. Wrestlers, designs, animations, music, sounds … Some extra mode and a pair of FMV are added in this version for SS. In the use of the processors, we can already see how AM focuses on the SH2 duo (both for 3D and FMV) and separates the SCU-DSP, which we can see reflected in its SGL. Although as in many other titles we see some type of use of the SCU-DSP that until now is unknown to me. We can see a use of the sound DSP with 80% memory usage. With respect to the total use of available memory reaches 85%, 90% of the main RAM and 85% of the VDP2. And still 75% of the VDP1. The quality of the FMV are acceptable, making use of DUCK (Video and audio) in 15bit mode with dithering at a good resolution (320×168) at 20FPS.

Last Bronx

Launched in 1997-08-01 at the end of the 3rd wave of Japanese developments we already see that the level of knowledge about the machine is high. Created by AM3, creating a new engine for him, and presumably without using anything from AM2. In short, this is a conversion of Model 2B CRX spectacular. Keeping in the barrier of 600 quads at 60FPS as in the rest of titles of AM2 (Except VF1 and Remix at 30FPS) but not HiRes on VDP1 and 2. Only on 2. But with double interlacing and rendering the total in Y in the VDP1. But keeping the use of only palette and buffer to 8bits, even having the possibility of using 16bit. Perhaps the reason is to guarantee the maximum speed, for the objective of 60FPS. In this way, when the buffer is erased, the VDP1 can go twice as fast. With a total real resolution for 3D elements of 351×507. With a final output resolution of VDP2 of 704×256 with double-interlacing, resulting in 704×512 in the VDP2 graphics. It was a perfect translation and quality of the geometry, textures, colors and animations, the latter perfect. The use, as usual from VF2, of the rotated plane RBG0 for the ground, with a detail equivalent to that of the recreational one. And a use of the rest of planes for brutal background. Double plane rotated (Floor and ceiling) in some scenarios, and selected elements that makes the VDP1 or planes of the VDP2 to give greater detail and depth. We also have no illumination on geometry nor are projected animated human shadows, they are a simple circular shadow but it is transparent at least with respect to the background. Something very exceptional for the resolution setting that the game has. In theory, it could not be done. I follow. The use of the processors is very good, using the SH2 tandem very well for both 3D and FMV, and with higher SCU-DPS usage signals than usual in games that it is not clear that it is being used for 3D. In this case up to 35% Memory and 20% of records. The use of MIDI music as well as CD-DA is appreciated, with a DSP use of 35% memory. The FMV intro is in DUCK format (Video / audio) with an acceptable resolution quality (288×200) at 15bit with dithering and audio (44100 mono), but at 30FPS. Finally, we can observe a total use of the SS memory of 80%, reaching 90% of the VDP1. 80% of the VDP2 and 75% of the main RAM. Compared at the time with Soul Blade, I think Last Bronx does not rival this one. It is true that a lot of aspects touch hand in hand, but the graphic milestone that Soul Blade represented in 32bits is remarkable, even within the same PS1, no other game gets where it is, maybe a couple more, but can not replicate the magic and the mastery that Namco dragged on the machine and on all the titles behind him. The Bronx is a great first fight title for AM3, and a first great port for SS. Perhaps with more games of this type behind and more time with SS, they would have given a Soul Blade to SS on the part of AM3. But it was not.

Dead or Alive

Launched in the 1997-10-09 finals of the 3rd wave of Japanese developments. In this title we can see a real milestone in the management and use of the SS. A game that comes from the Model 2A-CRX / 2B-CRX. Of a manifest superiority. Tecmo runs a sublime port. Where the models, textures, colors and animations are traced to the original, except the lighting and animated human shadows. And of course this 3D funds. These last ones replaced by several 2D planes of the VDP2 executed with intelligence and happening really unnoticed very well, excepting some stage like the monastery. The floor again is drawn with a rotated plane of the VDP2 with a quality equivalent to the recreational one. The final resolution and smoothness are the sign that shows in this game and in the conversion. It is the point that unites them both. Putting the maximum resolution that the SS can (VDP1 703×479, final VDP2 to 704×480) to 60FPS like a rock. As we have already discussed, arriving at this resolution in the VDP1 entails only being able to have 8BPP of color and losing the Color Functions like the Gouraud of the VDP1. The great work of textures in DOA makes it almost imperceptible that there is no dynamic lighting by source in the models. With regard to the use of processors we are facing the only game of these characteristics that takes advantage of all processors for 3D. Thus, we find a use of the double SH2 plus the SCU-DSP, the latter with a 100% use of memory and 45.5% of records. In sound DSP we find a 90% memory usage. Regarding the memory use of the system, it is 74%, with 65% of the VDP1 and 65% of the VDP2, of 87.5% of the main RAM. Finally, the port has an FMV intro, as was usual on the other hand, it uses DUCK compression (Video / audio) with an acceptable quality of resolution (304×176 20FPS) and sound (Stereo 22Khz) but at 15bit color with dithering in a VDP2 layer without zoom, wasting both the possibility of maximum color, 32bits, and scaling 100% of the screen to make the video as large as possible. The video as it is common in Namco is of a high bill. Already if we compare with the later version of PSX, the differences are wide. From the design direction, more modern in the case of PS1, as in the spectacularity of the game. Both are, but in a totally different way. Both exploit the machine, but with different objectives. In PS1, dynamic lighting is used, with human shadows and Gouraud shading. A resolution and super color 512×480 to 16bit with double interlaced at 60FPS. Without reaching the definition of SS, but with more beautiful graphics in another sense. It also loses geometry in the fighters and content in the scenarios, the latter being quite flatter, and the ground with clear perspective error, where in the SS it is perfect. In conclusion two large, but very different, versions.

3.3 The complicated I – EXTRA III Ball: Tessellation / LOD scenario / Mip Mapping

In this case the Tomb Raider always called my attention. Because that jump that the textures gave according to the distance was more than evident. Later to be able to see Wireframe confirm my suspicions. A Tessellation / LOD / Mip Mapping is used in the engine. I also noticed it in the Wipeout 2097, which also has a wireframe. Although there were many more.

At this point note that it is difficult to know if you are really using a division in real time meshes (Tessellation) or they are already divided before, which would be LOD (precalculated Tessellation). The same can be applied for textures with respect to Mip-mapping. If done in real time, we will consume main RAM for extra code and calculation cycles to divide polygons and textures. If we precalculate it, we have to store this data “duplicated or tripled” in main RAM for geometry and textures Mip in the VRAM.

The Tessellation and the mip-mapping, again, on the PSX was more accessible. We can see in its libraries at least two functions oriented to it: 1) “Polygon Division” and 2) “Mip-map Function Library”. Both using the GTE processor plus the GPU. Which brings us back to the conclusion that Sega could have done something similar. Using the 2xSH2+SCU-DSP+VDP1 tandem. Including similar functions in the libraries to do this effectively and give developers the desired output effect.

In the end this “effect” what it does is subdivide, increasing or decreasing the number of polygons depending on the relative size of a surface in the scenarios or terrains according to the distance to the camera. Normally based on maximum proximity or distance values. The result that is achieved in a feeling of more polygons on the screen, when in fact there is less. And a way to control the maximum amount of them at all times, to ensure that the performance in FPS does not fall dramatically. This effect, together with the LOD in objects, effectively optimizes the amount of geometry and optimal textures in each frame to be drawn by the GPU or VDP1 of SS.

In the end was this situation of disadvantage fixed?

Yes, in part. There are games where it is clearly seen. Although they remain to be discovered, when a current emulator includes the wireframe mode. Of the games that have this mode as a cheat and we can include those that possibly also have it, or something very similar.

With wireframe mode:

  • Tomb Raider (1996) → Subdivision + Mip-map in close, middle and far clipping.
  • Wipeout 2097 (1997) → Subdivision in Medium and Far clipping.
  • Nascar 98 (1997) → Subdivision + Mip-map in close clipping.
  • Courier Crisis (1997) → Subdivision + Mip-map in Middle and Far clipping.

Possible:

  • Wipeout 1 (1996) → Very possible, in PSX it was. Subdivision in medium and distant clipping.

As I have already done at the end of an important point, I am left with three titles that can illustrate the same and that I take to analyze, briefly, more in depth:

Tomb Raider

Launched in 1996-10-24 being a final game of the 2nd wave of European developments. We can see how the 3D worlds took an incredible leap this year and Tomb Raider was partly to blame. I recognize that the first time I saw the game, some PC captures in a press release. I did not call attention. When I played it in a demo I was immediately shocked. At that time I was about to have my first PC. And I bought the whole game before I had it XD. Already at that time I remember compare the PC version with PSX with a cousin of mine. After much later I was able to enjoy what was the first version of TR, in the 32bit console that I wanted at the time and I did not have. And I found an incredible version, for the moment and the machine on which it ran. It is true that in certain aspects, and as it was recognized by one of its developers, it came out perhaps too early and with bugs. With everything, you can see that this version received a lot of love from Core Design. And you can not blame him for doing a great job, since it was not a simple task.

We are facing a game where all the elements have Gouraud, something that we will not usually find in the life of the machine. Also, with lighting with font characters/enemies/assets and precalculated on the stage and animated with color in the water parts.

Let’s continue with the milestones of this game. It is inside the club of high polycount, 1300 maximum in screen (seen by me in Yabause, perhaps more) of course many more calculated, but with FPS drops. The slowdowns occur in complex and large areas or when you also add enemies dynamically illuminated and with their animations.

More its AI and sounds that are activated at this moment, fast cameras movements, etc… all this makes the implemented code and the SS does not give for more, producing the slowdowns by Vsync, discussed above. In 60Hz versions: From 30 to 20, from 20 to 15. In versions of 50Hz: From 25 to 17, from 17 to 13. In addition, I have my suspicions that the sound at the moment is also participating a lot in the performance drops. Maybe a saturation of the BUS-B, due to a bad optimization of the sound code or the data set. It is true that I have not found a trace of the official SEGA sound driver, I do not know if they were using one of their own. The fact that there are no sound adjustments plus the problems with this one, leads to the idea that this part is not as clean as it needs to be. Although we could say that despite everything about the 600 polys on screen with everything that may appear, the game remains above the at least 25FPS.

More achievements. Subdivision and mip-map for distant elements. The first is a very useful technique to reduce the total vertices to be transformed for the CPUs and the second is a reduction of texture pixels to be written for the VDP1 or any GPU. A technique also used by PS1 in the same direction and also to reduce distortion by perspective over short distances, in addition to facilitating the clipping or trimming of polygons, resulting in less overdraw. It is true that this technique was not as functional in both machines, resulting in SS more complicated to adjust the final look of the mip-maps and occupying more VRAM for it. But in conclusion always useful for a game with such a large number of polygons to handle. And this one was, of those that more at the moment.

Running at the maximum resolution that allows the VDP1 to have Gouraud and 16bit of color, 352×256, very similar to the resolution of PC (320x240x8 the only one that went medium smooth software at the time) or PS1 (364×240). The three versions share the level of subdivision and the effect “depth cueing” of fading to black in the far clipping and the value of distance of the same. We are talking then that the three versions move the same amount of geometric information. Regarding the amount of color, SS and PS1 are in 16bit, with a softer Gouraud and a very similar color palette for textures. The aspect of the end is the difference in the rasterized of both, the SS so particular. The maximum resolution of the textures is the same, only in SS the use of Mip-maps as we have said.

Another achievement in SS, Render-to-texture of the entire frame-buffer of the VDP1 that passes to a layer of the VDP2 when we enter pause / inventory to use it in the background. The same effect as in the other versions. But in SS it is not trivial. And it does it very fast, it’s practically instantaneous. In essence, it is partly the same effect that is used in the crystal reflection on PS1, but PS1 makes this super free, at a much higher speed than SS or PC.

TR makes a use of the formidable double SH2, in principle, despite the slowdowns of which we have spoken. It shows that Core did a great job to go away with them two. It is not fair to compare this work with the fact in PSX, because more than half of the equivalent work in PS1 with its GTE was done in its libraries. What has been said, what is done here by Core is remarkable and very respectable, except for the problems that still remained. There is a presence of use of the SCU-DSP that is usually the normal found in other games of 20% of Memory and 17.5% of records, I do not know exactly if it is being used for something in the 3D pipeline or something else . Most likely not. On the other hand the use of the sound DSP is important, 60% of memory.

The FMV have a very similar quality between versions. But in PS1 the DCT compression gives a little more quality. But the size and the final resolution is equal between all 320×120 to 30FPS. In SS Cinepack/PCM is used, with an acceptable quality 24bit color and stereo audio at 22KHz 16bit. The problem is the waste of this depth of color in the video file and the reproduction in the VDP1 limited to 15bit.

The use of SS memory capacity is 73%. With 95% of the VDP1 used, 90% of the main RAM. And with 50% of the VDP2.

With respect to other versions of the moment we find that the PC version as of PS1 have a better finish in details. But the one that comes out better is definitely Sony’s. The softness he displayed at the time was a punch in the face. When in PC, with a good machine, and SS in the same places there were important drops in PS1 it was maintained if not at 30FPS over 25FPS. It is true that there are points, where even the PS1 also suffers a little. Tomb Raider was a very demanding game at the time. Being fair with the PC version, it is the one with the least deformation of perspective and without vertex dance.

Zone where PSX also suffers

The most probable thing is that towards the end of the development (2/3 last months) the Core Design team had to divide the work among the 3 versions. When you reach an agreement with SEGA to leave a little earlier and as you can see in the status of the filtered betas, for this reason it was the version that was most advanced. When they were put to finish the rest, and given the greater requirements that had the SS to optimize that the other versions, resulted in what we have.

In the background are three “almost” identical versions. With a difference of barely a month between the releases. For me the main weaknesses of the SS version are:

  • A very aggressive Mip-mapping (Mip-map far to half the resolution than the close one), maybe looking for an increase in performance. Although I think the VDP1 was drawing with time to spare. In the beta of Julio is not like this, the distant Mip-map is just twice as close (just the opposite of the final version). Without noticing the polygon division as in the other versions. It is true that the performance of this Beta is even lower. All this makes me think that the subdivision, perhaps, is more thought for PC and PS1 to reduce deformation by perspective, than for SS. Because his way of drawing with this technique greatly reduced this problem. Although the fact of being able to reduce vertices is attractive to reduce the amount of information to be processed and face near-clipping of the camera. And I do not know if this SS, as implemented, can hurt more than helping the SS version.
  • Lack of optimization in the transformation and lighting code. I have come to the conclusion that the real problem of the SS version is this, it is clear when you put the game with wireframe, the performance does not vary.
  • Lack of color underwater lighting on the characters, as is already done in scenarios and Assets when you look from the inside out. It is seen that is something that were also implemented in the other versions, watching the betas. In SS he was left out.
  • Some corrupt textures in the EU version (the first), corrected in the USA and Japan.
  • Texture problems in flat polygons with textures and masks, duplicating both sides. In Beta, this problem does not exist:
    • Piano Gym corrected in the USA and Japanese.
    • First level wooden door no. If you look at it from behind it looks good.
  • Problems for discarding hidden faces or Backface Culling.
    • Lara house level balustrades. Only in SS do the ones in the back.
    • Wooden bridges that show the inside face of the wood. This results in more polygons with drawn texture.
  • Sound with problems, noise and panning (L / R) strangers.
  • Long load times and no load bar.
  • Excessive Z-figthing in plants / billboards.
  • Excessive tremors and flickering in level textures, even more than in PS1.
  • Transparencies in shadows (with a Scaled Sprite) and flat particles as water splash waterfalls, being flat there was no problem of redrawing.
  • Lara screen load missing home. Yes in betas. Corrected in the USA and Japanese version.
  • No sound configuration. In the beta “this” you can see and interact with the options, but it does not work.
  • Lack of collisions in some assets. Corrected in the USA and Japanese version.
  • FMV full screen and 24bit color.
  • More spectacular effects:
    • Like the distortion of water, but better finished.
    • Rhombus reflex of the saved, also possible in SS with limitations, but possible.
  • Greater use of VDP2 both in the 3D game (UI, Backgrounds) and in FMV (To show the colors of the video to the maximum of 32bit color of VDP2)
  • Lara’s handstand animation missing.

Finish with an anecdote. I was surprised that Tomb Raider did not use the High Speed Shrink. First I thought it was a great opportunity to gain performance. Later I discovered that being it, it is little. And that the problem of TR in SS is not in the VDP1, which I think is loose (even with a less aggressive Mip-Map) if not in the transformation and lighting code as we have already discussed.

Nascar 98

Launched in the 1997-11-13. End of the 3rd wave of developments 3rd party of USA. To better understand this latest Nascar game from EA it is necessary to understand where a large part of your code comes from. The origin of Nascar 98 is Andretti Racing, launched a year earlier. It was a port of Press Start on a Stormfront development for a complete project for PC and PS1. Already in this game the differences between the tandem PC / PS1 and SS, were patent. What just happened, is a year later on this basis this game was created. In the case of PC / PS1, details were developed, some bigger than others. In SS, these simply did not arrive, and others were not in the previous one. They did not even try to add what was left in Andretti Racing. Let’s make a small comparative breakdown between Andretti Racing and Nascar 98 of SS already with what was left out or inferior to the other versions:

  1. Original developments for PSX by Stormfront, port of Press Start Inc. with 5 people in total. 1 Programmer Unknown time
  2. Menus in SD in both.
  3. Both with ADPCM sound.
  4. Over 1000 quads to 20FPS in SS. In PSX I calculate that at least 1500 to 20 / 25FPS.
  5. The drawing distance is less in SS, but it has a rather aggressive Mip-Map.
  6. It does not have a rear-view mirror. Although in PSX the drawing distance is reduced as in SS or a little more when you put it.
  7. There is no Render-to-texture for a giant screen in any circuit, which is in PSX.
  8. One month apart from the launch on PSX.
  9. The polygons are kept in the cars and without Gouraud in both.
  10. The horizon background, does not fit in Y correctly.
  11. Same resolution during the 3D game, textures and color.
  12. Sound with reverberation in tunnels.
  13. The FMV almost equal to PSX (BPP, compression, audio and FPS), except for the slightly lower resolution in it Y. In a VDP2 layer and it does not use the zoom to take the video to the maximum. Why not? I do not know. With a very good proprietary EA codec, using the SCU-DSP in both games.
  14. Playable content and other playable features. Except rearview

Now between the versions of Nascar 98 specifically for PC / PSX and SS:

  1. Small animated brightness only on PSX.
  2. Car less polygon in the front.
  3. Static Gouraud on all cars on PSX, on SS no.
  4. Has tessellation in the circuit in the close clipping, 1/2.
  5. Far cars in PSX LOD without texture with Gouraud, SS LOD less geometry, same texture as in nearby LOD.
  6. In cockpit, no hands, shadows animated bars and no lens flare from the sun.
  7. The fences in PSX are transparencies like glass, in SS it is a grid texture.

In a year between both projects there is a big difference in the evolution between the machines. In PSX there is progress in the engine and optimization, use of Gouraud, a cockpit and better effects. In conclusion they learned and went further. In SS, nothing. All the same.

  • In the 3D does not use the SCU-DSP.
  • They do not improve the code I had a year ago with 2xSH2.
  • They do not learn to use the VDP2 better for funds.
  • The VDP1 to use the Gouraud.

Anyway … Nothing. Did not they have all that time? Did they wait to have the PC / PSX code and convert it at the end? I do not know, but it does not represent what the SS could do.

With respect to the rest of the data. As in the rest of the titles of EA for SS, Nascar makes use of the magnificent proprietary codec DCT of EA, as in Andretti Racing. This makes use of the two SH2 plus the SCU-DSP intensely, reaching 90% of the Memory and 39.2% of the registers, a very high data of use. With respect to the sound DSP, it moves from 35% (FMV / menus) to 70% (3D game) of the memory, with signs of use of reverberation effect. And although there is some track CD-DA, I’m almost sure that there is ADPCM sound in both titles, as there are files with extensions type “adp”. And it is known that EA at this time had a proprietary Audio codec based on ADPCM. With respect to the use of available memory, it uses 75% of the total, reaching, also, 75% of the VDP1, the main RAM and the VDP2.

Courier Crisis

Launched in 1997-12-20 at the end of the 3rd wave of USA developments as well. A multiplatform title totally. It is clearly seen that it is developed at the same time and in mind, for SS and PSX. They share all the information, pros and cons. Except some typical detail and other curious. Share:

  • Same resolution and color.
  • Subdivision and Mip-map.
  • Drawing distance.
  • Frame rate unlocked and very unstable. Between 60-20FPS. An average of 30 / 25FPS.

Does not share:

  • SS has Depth Cueing, PSX does not.
  • SS has a bit more drawing distance than PSX.
  • Transparencies. In SS some on menus using VDP1> VDP2.
  • FMV Codec different, but quality in resolution and screen area equal by scaling. SS DUCK / PCM, VDP1 at 15bit color dithering at 15FPS, mono audio.
  • Slightly faster charging times SS.

With regard to the use of processors, we find a use of the two standard SH2 and interesting signs of using SCU-DSP, I do not know exactly if for FMV or 3D. But reaching 23% of memory usage and 26% of records. With respect to the sound DSP it is at 35% memory usage. Finally with respect to the use of available memory 83%, being that of VDP1 95% and of VDP2 50%. The main RAM 85%.

End of the second part.

8 comments

  1. Regarding the VDP1 fillrate calculation equation:
    k + (x * y * l) + (y * n)

    – k is the setup time, the amount of time it takes to read the draw command, set up all the internal registers, read the gouraud shading table (if there is any), set that up, etc. Playstation should also have a value for this, but I don’t know the exact figures.
    – x / y are the sprite (texture) dimensions. Important to note that since this is a 2 dimensional values (X * Y), it includes the pixel overdraw.
    – n is, according to Steve Snake, an approximate of the sdram cycle miss penalty. It is not an exact calculation, but a good enough general approximation.
    – l is 3 cycles, for reading a texture, processing a texture, and writing a texture pixel. If you draw untextured polygons, you should only have 2 cycles here because you only need to process and write a pixel, no need to read textures. But this is just conjecture.

    I think what the VDP1 does is, use 1 cycle to read a texture to internal register, 1 cycle to process it, and 1 cycle to write out the data from internal register to framebuffer. This is very inefficient. Yes, this does cut the max fillrate to less than 9,56 MPixel/sec. I don’t know why it does this, maybe a limitation due to the nature of SDRAM, or they just didn’t knew how to do it otherwise. This also explains why semitransparency can be so extremely slow. But for semitransparency, the “up to 6 times” figure may have been a best case versus worst case comparison, best case being untextured pixel, worst case being textured, gouraud shaded, semi-transparent pixel.

    I don’t know exactly how the Playstation textures, but from the fillrate numbers it is evident that it can draw a textured pixel in 1 cycle (33,8 MPixel/sec). Maybe this was due to it using VRAM which is double ported (later SGRAM which could simulate this), maybe because it was just a better chip, maybe both.
    For the 66MPixel value, it can only achieve this by directly copying pixels in memory without processing. I don’t know if the lack of processing is what doubles the speed, or the fact that such data is always rectangular so it can make 32bit read/writes safely. But you get no shading, no lightning, no deformation, only copy rectangular textures at twice the speed.

    By the way you can do gouraud shading in 8-bit palette mode for Saturn. You have to use red colour only gouraud shading on a palette pixel. The 5 bits for the red shading correspond to the palette entry of the pixel. So this way you can ramp up/down the palette index using the VDP1, doing hardware lightning. The downside is that you have to pre-calculate the lightning gradients in the palettes, losing you colour count. The upside is that you can change the lightning colour by updating the palette. I think the cubes in the CD player do this to change colour.
    You could even set custom colour gradients, like inversions, to do crazy things like simulate bump mapping. But only one tech demo of that exists.

    I don’t think CPU usage can be measured in a percentage, because you also need to spend time using the bus or copying data. Maybe using the Slave SH2 or the SCU DSP at 100% would have made the Master SH2 too busy to process their data, leaving it less time to crunch numbers. A single usage percentage would not account for this. You also need time to upload draw commands to the system. And memory usage is even trickier because half the main memory is slower, and you might not have anything relevant in your game code for that, so you leave parts of it empty.

    The percentage calculation is also misleading for texture usage, because textures and draw commands share VRAM. Have too much of one, and the others will suffer. You will end up needing to upload commands twice per frame or upload texture more often between frames, and while you do that, the VDP1 can’t draw anything (you are hogging the VRAM from it by uploading). So you have to balance things out, which leads to lower memory usage on average, to allow for higher peak values in poly count.

    Great job on writing up a list of which Saturn games use which texture modes! That’s really fascinating!

    And for the most graphically impressive Saturn games, I’d recommend you check out Scorcher. It uses very detailed and colourful textures (easily Playstation quality texturing), semitransparency (rectangular only to avoid the pixel overwrite bug), and decent speed. It is also one of the only games to use both rotating playfields of the VDP2, one of them being transparent. I think each level uses more textures than what fits in the memory, and the game is uploading it dynamically, which explains both the checkpoints and why sometimes they take longer than usual, and a very rare bug where if you race too fast, the game stops texturing for a second. For shading, it only has static pre-baked lightning on the level textures (plus fogging), but nonetheless it looks great because it has so many colours on screen.

  2. Nice article. Emulators certainly are a great way to better understand how these games where programmed, and how they performed.

    However, you should use caution when measuring the framerate of a game in an emulator : they are not accurate enough to match exactly the framerate of the same game on a real console. For example, you list Panzer Dragoon Zwei as 20 FPS, but on console it runs at 30 FPS in game and 60 FPS during cutscenes.

    Also, I would add another feature that games could use to push the console to the max : using the higher clock rate of 28 MHz instead of the default one at 26 MHz. Increasing that chock makes both SH2, VDP1, VDP2, SCU and their memory faster. VDP2 displays automatically 10 % more pixels horizontally (352 instead of 320 in SD), but you don’t have to draw the additional pixels. That way you can cover the same screen area than at the lower clock rate, but with more cycles per pixel.

    Strangely enough, all 3 Lobotomy Software FPS run at the lower clock rate. There’s always room for improvement, even in well programmed games…

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.