Programming directx games
In this lesson, you will also create a command queue and a command list and learn how to synchronize the CPU and GPU operations in order to correctly implement N-buffered rendering. The primary reason for this change is the demand from the gaming industry to provide a rendering SDK that gives more power and control to the graphics programmer. Vendor-specific driver implementations were often complex and imposed a CPU performance overhead that the developer had no control over.
Much of this overhead could be avoided if you give control back to the developers. One example of the driver overhead that is present in previous versions of the DirectX SDK is resource management. Drivers needed to track the lifetime of every resource that was used by the rendering pipeline.
Tracking of resources by the driver is often unnecessary if it can be assumed that the application programmer can perform this task with much less overhead.
Providing the developers with the tools to implement their own resource management takes that responsibility away from the driver implementation and often allows for performance improvements if done correctly. But with great power, comes great responsibility. As with all things, the first time you encounter something it may seem daunting or too difficult to learn but if you are persistent in your desire to learn this new SDK, the rewards will be well worth it.
The previous versions of the DirectX SDK will still work but if you are either looking for a job in the game industry or just trying to update your knowledge and skills in the area of graphics programming, it is required that you learn the DirectX 12 SDK. This lesson is written with no assumptions about your current skill level and assumes you have never written a graphics application before. The various components of the DirectX API provide low-level access to the hardware running on Windows based operating systems [6].
The first version of DirectX was not released at the same time as Windows 95 but shortly after it in September [6]. DirectX 2. Through the period of , the DirectX library went through several version changes to reach version 5. Subsequent major revisions were released on an annual basis until DirectX 9 which was released two years after DirectX 8 [6].
DirectX 8. Shader Model 1 [9] was the first shader model which introduced vertex and pixel shaders to the programmable pipeline. DirectX 9. Shader Model 3. Shader Model 4. The geometry shader allows the graphics programmer to create new geometric primitives from simpler primitives for example, take a single point as input to the geometry shader and produce a set of triangles. DirectX 11 was released in October and introduced Shader Model 5. Shader Model 5. Tessellation shaders provide the ability to dynamically refine the level of detail of a model by computing the triangle primitives from control points of a Bezier surface for example, but other tessellation techniques can also be implemented in the tessellation shader.
Compute shaders allow the graphics programmer to create general purpose programs that advantage of the massive parallelism of the Graphics Processing Unit GPU. DirectX 12 and Direct3D Texture arrays were already possible prior to Shader Model 5.
Using descriptor arrays allows texture of varying dimensions and storage formats to be accessed from a single shader variable. On April 11, , together with the Windows 10 creators update version , Shader Model 6.
Shader Model 6. The wave-level intrinsic functions added in Shader Model 6. The API that is concerned with hardware accelerated 3D graphics rendering is called Direct3D and is the subject of this article. Direct2D is a hardware-accelerated, immediate-mode, 2D graphics API that provides high-performance and high-quality rendering for 2D geometry, bitmaps, and text.
Direct3D is the primary subject of this article. DirectWrite supports high-quality text rendering, resolution-independent outline fonts, and full Unicode text and layouts. XInput replaces DirectInput. The DirectX 12 graphics pipeline consists of several stages. The following diagram illustrates the various stages of the DirectX 12 graphics pipeline. The arrows indicate the flow of data from each stage of the graphics pipeline as well as from memory resources such as buffers, textures, and constant buffers that are available in high-speed GPU memory.
DirectX 12 Graphics Pipeline [13]. The image illustrates the various stages of the DirectX 12 rendering pipeline. The blue rectangular blocks represent the fixed-function stages and cannot be modified programmatically. The green rounded-rectangular blocks represent the programmable stages of the graphics pipeline. The first stage of the graphics pipeline is the Input-Assembler IA stage. The purpose of the input-assembler stage is to read primitive data from user-defined vertex and index buffers and assemble that data into geometric primitives line lists, triangle strips, or primitives with adjacency data.
The Vertex Shader VS stage is responsible for transforming the vertex data from object-space into clip-space. The vertex shader can also be used for performing skeletal animation or computing per-vertex lighting. The vertex shader takes a single vertex as input and outputs the clip-space position of the vertex. The vertex shader is the only shader stage that is absolutely required in order to define a valid pipeline state object [15]. The Hull Shader HS stage is an optional shader stage and is responsible for determining how much an input control patch should be tessellated by the tessellation stage [14].
The Tessellator Stage is a fixed-function stage that subdivides a patch primitive into smaller primitives according to the tessellation factors specified by the hull shader stage [14]. The Domain Shader DS stage is an optional shader stage and it computes the final vertex attributes based on the output control points from the hull shader and the interpolation coordinates from the tesselator stage [14]. The input to the domain shader is a single output point from the tessellator stage and the output is the computed attributes of the tessellated primitive.
The Geometry Shader GS stage is an optional shader stage that takes a single geometric primitive a single vertex for a point primitive, three vertices for a triangle primitive, and two vertices for a line primitive as input and can either discard the primitive, transform the primitive into another primitive type for example a point to a quad or generate additional primitives. This data can be recirculated back to the rendering pipeline to be processed by another set of shaders.
This is useful for spawning or terminating particles in a particle effect. The geometry shader can discard particles that should be terminated or generate new particles if particles should be spawned. The Rasterizer Stage RS stage is a fixed-function stage which will clip primitives into the view frustum and perform primitive culling if either front-face or back-face culling is enabled. The rasterizer stage will also interpolate the per-vertex attributes across the face of each primitive and pass the interpolated values to the pixel shader.
The Pixel Shader PS stage takes the interpolated per-vertex values from the rasterizer stage and produces one or more per-pixel color values. The pixel shader is invoked once for each pixel that is covered by a primitive [15]. The Output-Merger OM stage combines the various types of output data pixel shader output values, depth values, and stencil information together with the contents of the currently bound render targets to produce the final pipeline result.
One of the more difficult concepts to understand for beginning DirectX 12 programmers is synchronization. In earlier versions of DirectX and in OpenGL there was no need to be concerned with GPU synchronization in order to get the GPU to render something, it was usually handled by the driver and required little to no intervention from the graphics programmer.
If GPU synchronization is not handled correctly the programmer will receive errors from the DirectX debug layer that will be difficult to understand and debug. GPU synchronization is also very important to understand when performing resource management. Resources cannot be freed if they are currently being referenced in a command list that is being executed on a command queue.
It is only safe to release those resources after the command queue has finished executing any command list that is referencing those resources. Before going into too much detail about GPU synchronization, a few terms that may not be familiar are described. The Fence object is used to synchronize commands issued to the Command Queue.
The fence stores a single value that indicates the last value that was used to signal the fence. Although it is possible to use the same fence object with multiple command queues, it is not reliable to ensure the proper synchronization of commands across command queues.
Therefore, it is advised to create at least one fence object for each command queue. Multiple command queues can wait on a fence to reach a specific value, but the fence should only be allowed to be signaled from a single command queue. In addition to the fence object, the application must also track a fence value that is used to signal the fence.
A Command List is used to issue copy, compute dispatch , or draw commands. In DirectX 12 commands issued to the command list are not executed immediately like they are with the DirectX 11 immediate context. All command lists in DirectX 12 are deferred; that is, the commands in a command list are only run on the GPU after they have been executed on a command queue.
The Command Queue in DirectX 12 has a very simple interface. The Render method is responsible for rendering the scene. It does this by first populating the command list that contain all of the draw or compute commands that are needed to render the scene. The resulting command list is then executed on the command queue using the ExecuteCommandList method. The call to to the ExecuteCommandList method will not block the calling thread. It does not wait for the commands in the command list to be executed on the GPU before it returns to the caller.
The Signal method will append a fence value to the end of the command queue. In other words, the completed value for the fence object will be set to the specified fence value only after all of the commands that were executed on the command queue prior to the Signal have finished executing on the GPU.
The call to Signal does not block the calling thread but instead just returns the value to wait for before any writable GPU resources that are referenced in the command lists can be reused. The Present method on line 23 will cause the rendered result to be presented to the screen. The return value from the Present method in this pseudo-code example returns the index of the next backbuffer within the swap-chain to render to. For this reason, the back-buffer resource from the previous frame cannot be reused until the image has been presented to the screen.
To prevent the resource from being overwritten before they are presented to the screen, the CPU thread needs to wait for the fence value of the previous frame to be reached. DirectX 12 defines three different command queue types:. Although the DirectX 12 API defines these three different command queue types, it is not necessarily the case that the GPU in your computer actually has three physical work queues.
It may also be the case that the GPU may have one dedicated work queue for each one of these types and it may even be the case that it has multiple work queues of each type. If you decide to create multiple queues in your own applications, you should allocate one fence object and track one fence value for each allocated command queue. An example of performing GPU synchronization.
In the image above several commands are issued on the main thread. In this example, the first frame is denoted Frame N. The command lists are executed on the command queue. Immediately after executing the command lists, the queue is signaled with the value N. When the command queue reaches that point, the fence will be signaled with the specified value. Since there were no commands in the command queue in Frame N-1 , execution continues without stalling the CPU thread.
In this case, the CPU has to wait until signal N is reached which indicates that the command queue is finished with those resources. This example demonstrates a typical double-buffered scenario. You might think that using triple-buffering for rendering will reduce the amount of time the CPU has to wait for the GPU to finish its work.
Whenever the CPU is faster at issuing commands than the command queue is at processing those commands, the CPU will have to stall at some point in order to allow the command queue to catch-up to the CPU.
It gets more complicated if you add an additional queue. In this case, you must be careful not to signal the second queue with a fence value that is larger than, but could be completed before, a fence value that was used on another queue using the same fence object. Doing so could result in the fence reaching the fence value from the other queue before the main queue has reached the earlier fence value. Incorrect Synchronization with multiple queues.
The moral of the story is to make sure that every command queue tracks its own fence object and fence value and only signals its own fence object. To be safe, the fence value for a queue should never be allowed to decrease. If the command queue is signaled times per frame and your game is rendering at an average of FPS the queue is signaled 30, times per second , the game could run for about In order to follow along with this tutorial series, you should ensure that you have the following software installed on your computer.
In the following sections, we will create the DirectX 12 demo application. In this tutorial, the demo will only create a window and clear the screen. Any additional feedback? In this article. Prerequisites for developing with DirectX. When you start to develop a Windows app using DirectX, keep the prerequisites on this page in mind.
This includes the technologies you need to know before you dive in. Get started with DirectX for Windows. In this lesson a simple root signature is created that defines a single constant buffer that contains the Model-View-Projection MVP matrix that is used to rotate a model in the scene. This is the first lesson in a series of lessons to teach you how to create a DirectX 12 application from scratch.
In this lesson, you will learn how to query for DirectX 12 capable display adapters that are available, create a DirectX 12 device, create a swap-chain, and you will also learn how to present the swap chain back buffer to the screen. In this lesson, you will also create a command queue and a command list and learn how to synchronize the CPU and GPU operations in order to correctly implement N-buffered rendering. In this post, Volume Tiled Forward Shading rendering is described.
Similar to Clustered Shading, Volume Tiled Forward Shading builds a 3D grid of volume tiles clusters and assigns the lights in the scene to the volumes tiles. Only the lights that are intersecting with the volume tile for the current pixel need to be considered during shading.
Simply set the file properties to use the content pipeline and configure the settings. Visual Studio will perform the format conversions for you at build time. Visual Studio Community Visual Studio Professional Visual Studio Enterprise This device is not currently supported for these products.
To continue downloading, click here. Write, build, and debug your DirectX games in Visual Studio. Download Visual Studio Community Professional Enterprise Project templates. Get started quickly. Debug graphics.
0コメント