Android-x86
Fork
Donation

  • R/O
  • HTTP
  • SSH
  • HTTPS

external-mesa: Commit

external/mesa


Commit MetaInfo

Revisión8b618beb4ec6254dbe90504177a573269a940c10 (tree)
Tiempo2019-02-02 23:25:16
AutorChih-Wei Huang <cwhuang@linu...>
CommiterChih-Wei Huang

Log Message

Merge remote-tracking branch 'mesa/18.3' into oreo-x86

Cambiar Resumen

Diferencia

--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
1-18.3.2
1+18.3.3
--- a/bin/.cherry-ignore
+++ b/bin/.cherry-ignore
@@ -2,3 +2,15 @@
22 c02390f8fcd367c7350db568feabb2f062efca14 egl/wayland: rather obvious build fix
33 # fixes: The commit addresses b4476138d5ad3f8d30c14ee61f2f375edfdbab2a
44 ff6f1dd0d3c6b4c15ca51b478b2884d14f6a1e06 meson: libfreedreno depends upon libdrm (for fence support)
5+
6+# fixes: This commit requires commits aeaf8dbd097 and 7484bc894b9 which did not
7+# land in branch.
8+f67dea5e19ef14187be0e8d0f61b1f764c7ccb4f radv: Fix multiview depth clears
9+
10+# stable The commits aren't suitable in their present form.
11+bfe31c5e461a1330d6f606bf5310685eff1198dd nir/builder: Add nir_i2i and nir_u2u helpers which take a bit size
12+abfe674c54bee6f8fdcae411b07db89c10b9d530 spirv: Handle arbitrary bit sizes for deref array indices
13+
14+# warn The commits refer stale sha, yet don't fix anything in particular.
15+98984b7cdd79c15cc7331c791f8be61e873b8bbd Revert "mapi/new: sort by slot number"
16+9f86f1da7c68b5b900cd6f60925610ff1225a72d egl: add glvnd entrypoints for EGL_MESA_query_driver
--- a/bin/get-pick-list.sh
+++ b/bin/get-pick-list.sh
@@ -44,7 +44,7 @@ is_sha_nomination()
4444 # Treat only the current line
4545 id=`echo "$fixes" | tail -n $fixes_count | head -n 1 | cut -d : -f 2`
4646 fixes_count=$(($fixes_count-1))
47- if ! git show $id &>/dev/null; then
47+ if ! git show $id >/dev/null 2>&1; then
4848 echo WARNING: Commit $1 lists invalid sha $id
4949 fi
5050 done
@@ -143,7 +143,7 @@ do
143143 esac
144144
145145 printf "[ %8s ] " "$tag"
146- git --no-pager show --summary --oneline $sha
146+ git --no-pager show --no-patch --oneline $sha
147147 done
148148
149149 rm -f already_picked
--- a/configure.ac
+++ b/configure.ac
@@ -1864,6 +1864,7 @@ for plat in $platforms; do
18641864 ;;
18651865
18661866 drm)
1867+ test "x$enable_egl" = "xyes" &&
18671868 test "x$enable_gbm" = "xno" &&
18681869 AC_MSG_ERROR([EGL platform drm needs gbm])
18691870 DEFINES="$DEFINES -DHAVE_DRM_PLATFORM"
--- /dev/null
+++ b/docs/relnotes/18.3.3.html
@@ -0,0 +1,208 @@
1+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
2+<html lang="en">
3+<head>
4+ <meta http-equiv="content-type" content="text/html; charset=utf-8">
5+ <title>Mesa Release Notes</title>
6+ <link rel="stylesheet" type="text/css" href="../mesa.css">
7+</head>
8+<body>
9+
10+<div class="header">
11+ <h1>The Mesa 3D Graphics Library</h1>
12+</div>
13+
14+<iframe src="../contents.html"></iframe>
15+<div class="content">
16+
17+<h1>Mesa 18.3.3 Release Notes / January 31, 2019</h1>
18+
19+<p>
20+Mesa 18.3.3 is a bug fix release which fixes bugs found since the 18.3.2 release.
21+</p>
22+<p>
23+Mesa 18.3.3 implements the OpenGL 4.5 API, but the version reported by
24+glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
25+glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
26+Some drivers don't support all the features required in OpenGL 4.5. OpenGL
27+4.5 is <strong>only</strong> available if requested at context creation.
28+Compatibility contexts may report a lower version depending on each driver.
29+</p>
30+
31+
32+<h2>SHA256 checksums</h2>
33+<pre>
34+6b9893942fe8011c7736d51448deb6ef80ece2257e0fac27b02e997a6605d5e4 mesa-18.3.3.tar.gz
35+2ab6886a6966c532ccbcc3b240925e681464b658244f0cbed752615af3936299 mesa-18.3.3.tar.xz
36+</pre>
37+
38+
39+<h2>New features</h2>
40+<p>None</p>
41+
42+
43+<h2>Bug fixes</h2>
44+
45+<ul>
46+
47+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=108877">Bug 108877</a> - OpenGL CTS gl43 test cases were interrupted due to segment fault</li>
48+
49+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109023">Bug 109023</a> - error: inlining failed in call to always_inline ‘__m512 _mm512_and_ps(__m512, __m512)’: target specific option mismatch</li>
50+
51+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109129">Bug 109129</a> - format_types.h:1220: undefined reference to `_mm256_cvtps_ph'</li>
52+
53+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109229">Bug 109229</a> - glLinkProgram locks up for ~30 seconds</li>
54+
55+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109242">Bug 109242</a> - [RADV] The Witcher 3 system freeze</li>
56+
57+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109488">Bug 109488</a> - Mesa 18.3.2 crash on a specific fragment shader (assert triggered) / already fixed on the master branch.</li>
58+
59+</ul>
60+
61+
62+<h2>Changes</h2>
63+
64+<p>Andres Gomez (2):</p>
65+<ul>
66+ <li>bin/get-pick-list.sh: fix the oneline printing</li>
67+ <li>bin/get-pick-list.sh: fix redirection in sh</li>
68+</ul>
69+
70+<p>Axel Davy (1):</p>
71+<ul>
72+ <li>st/nine: Immediately upload user provided textures</li>
73+</ul>
74+
75+<p>Bas Nieuwenhuizen (3):</p>
76+<ul>
77+ <li>radv: Only use 32 KiB per threadgroup on Stoney.</li>
78+ <li>radv: Set partial_vs_wave for pipelines with just GS, not tess.</li>
79+ <li>nir: Account for atomics in copy propagation.</li>
80+</ul>
81+
82+<p>Bruce Cherniak (1):</p>
83+<ul>
84+ <li>gallium/swr: Fix multi-context sync fence deadlock.</li>
85+</ul>
86+
87+<p>Carsten Haitzler (Rasterman) (2):</p>
88+<ul>
89+ <li>vc4: Use named parameters for the NEON inline asm.</li>
90+ <li>vc4: Declare the cpu pointers as being modified in NEON asm.</li>
91+</ul>
92+
93+<p>Danylo Piliaiev (1):</p>
94+<ul>
95+ <li>glsl: Fix copying function's out to temp if dereferenced by array</li>
96+</ul>
97+
98+<p>Dave Airlie (3):</p>
99+<ul>
100+ <li>dri_interface: add put shm image2 (v2)</li>
101+ <li>glx: add support for putimageshm2 path (v2)</li>
102+ <li>gallium: use put image shm2 path (v2)</li>
103+</ul>
104+
105+<p>Dylan Baker (4):</p>
106+<ul>
107+ <li>meson: allow building dri driver without window system if osmesa is classic</li>
108+ <li>meson: fix swr KNL build</li>
109+ <li>meson: Fix compiler checks for SWR with ICC</li>
110+ <li>meson: Add warnings and errors when using ICC</li>
111+</ul>
112+
113+<p>Emil Velikov (4):</p>
114+<ul>
115+ <li>docs: add sha256 checksums for 18.3.2</li>
116+ <li>cherry-ignore: radv: Fix multiview depth clears</li>
117+ <li>cherry-ignore: spirv: Handle arbitrary bit sizes for deref array indices</li>
118+ <li>cherry-ignore: WARNING: Commit XXX lists invalid sha</li>
119+</ul>
120+
121+<p>Eric Anholt (2):</p>
122+<ul>
123+ <li>vc4: Don't leak the GPU fd for renderonly usage.</li>
124+ <li>vc4: Enable NEON asm on meson cross-builds.</li>
125+</ul>
126+
127+<p>Eric Engestrom (2):</p>
128+<ul>
129+ <li>configure: EGL requirements only apply if EGL is built</li>
130+ <li>meson/vdpau: add missing soversion</li>
131+</ul>
132+
133+<p>Iago Toral Quiroga (1):</p>
134+<ul>
135+ <li>anv/device: fix maximum number of images supported</li>
136+</ul>
137+
138+<p>Jason Ekstrand (3):</p>
139+<ul>
140+ <li>anv/nir: Rework arguments to apply_pipeline_layout</li>
141+ <li>anv: Only parse pImmutableSamplers if the descriptor has samplers</li>
142+ <li>nir/xfb: Fix offset accounting for dvec3/4</li>
143+</ul>
144+
145+<p>Karol Herbst (2):</p>
146+<ul>
147+ <li>nv50/ir: disable tryCollapseChainedMULs in ConstantFolding for precise instructions</li>
148+ <li>glsl/lower_output_reads: set invariant and precise flags on temporaries</li>
149+</ul>
150+
151+<p>Lionel Landwerlin (1):</p>
152+<ul>
153+ <li>anv: fix invalid binding table index computation</li>
154+</ul>
155+
156+<p>Marek Olšák (4):</p>
157+<ul>
158+ <li>radeonsi: also apply the GS hang workaround to draws without tessellation</li>
159+ <li>radeonsi: fix a u_blitter crash after a shader with FBFETCH</li>
160+ <li>radeonsi: fix rendering to tiny viewports where the viewport center is &gt; 8K</li>
161+ <li>st/mesa: purge framebuffers when unbinding a context</li>
162+</ul>
163+
164+<p>Niklas Haas (1):</p>
165+<ul>
166+ <li>radv: correctly use vulkan 1.0 by default</li>
167+</ul>
168+
169+<p>Pierre Moreau (1):</p>
170+<ul>
171+ <li>meson: Fix with_gallium_icd to with_opencl_icd</li>
172+</ul>
173+
174+<p>Rob Clark (1):</p>
175+<ul>
176+ <li>loader: fix the no-modifiers case</li>
177+</ul>
178+
179+<p>Samuel Pitoiset (1):</p>
180+<ul>
181+ <li>radv: clean up setting partial_es_wave for distributed tess on VI</li>
182+</ul>
183+
184+<p>Timothy Arceri (5):</p>
185+<ul>
186+ <li>ac/nir_to_llvm: fix interpolateAt* for arrays</li>
187+ <li>ac/nir_to_llvm: fix clamp shadow reference for more hardware</li>
188+ <li>radv/ac: fix some fp16 handling</li>
189+ <li>glsl: use remap location when serialising uniform program resource data</li>
190+ <li>glsl: Copy function out to temp if we don't directly ref a variable</li>
191+</ul>
192+
193+<p>Tomeu Vizoso (1):</p>
194+<ul>
195+ <li>etnaviv: Consolidate buffer references from framebuffers</li>
196+</ul>
197+
198+<p>Vinson Lee (1):</p>
199+<ul>
200+ <li>meson: Fix typo.</li>
201+</ul>
202+
203+
204+
205+</div>
206+</body>
207+</html>
208+
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -589,7 +589,7 @@ struct __DRIdamageExtensionRec {
589589 * SWRast Loader extension.
590590 */
591591 #define __DRI_SWRAST_LOADER "DRI_SWRastLoader"
592-#define __DRI_SWRAST_LOADER_VERSION 4
592+#define __DRI_SWRAST_LOADER_VERSION 5
593593 struct __DRIswrastLoaderExtensionRec {
594594 __DRIextension base;
595595
@@ -649,6 +649,23 @@ struct __DRIswrastLoaderExtensionRec {
649649 void (*getImageShm)(__DRIdrawable *readable,
650650 int x, int y, int width, int height,
651651 int shmid, void *loaderPrivate);
652+
653+ /**
654+ * Put shm image to drawable (v2)
655+ *
656+ * The original version fixes srcx/y to 0, and expected
657+ * the offset to be adjusted. This version allows src x,y
658+ * to not be included in the offset. This is needed to
659+ * avoid certain overflow checks in the X server, that
660+ * result in lost rendering.
661+ *
662+ * \since 5
663+ */
664+ void (*putImageShm2)(__DRIdrawable *drawable, int op,
665+ int x, int y,
666+ int width, int height, int stride,
667+ int shmid, char *shmaddr, unsigned offset,
668+ void *loaderPrivate);
652669 };
653670
654671 /**
--- a/meson.build
+++ b/meson.build
@@ -1,4 +1,4 @@
1-# Copyright © 2017-2018 Intel Corporation
1+# Copyright © 2017-2019 Intel Corporation
22
33 # Permission is hereby granted, free of charge, to any person obtaining a copy
44 # of this software and associated documentation files (the "Software"), to deal
@@ -165,6 +165,14 @@ with_gallium_svga = _drivers.contains('svga')
165165 with_gallium_virgl = _drivers.contains('virgl')
166166 with_gallium_swr = _drivers.contains('swr')
167167
168+if cc.get_id() == 'intel'
169+ if meson.version().version_compare('< 0.49.0')
170+ error('Meson does not have sufficient support of ICC before 0.49.0 to compile mesa')
171+ elif with_gallium_swr and meson.version().version_compare('== 0.49.0')
172+ warning('Meson as of 0.49.0 is sufficient for compiling mesa with ICC, but there are some caveats with SWR. 0.49.1 should resolve all of these')
173+ endif
174+endif
175+
168176 with_gallium = _drivers.length() != 0 and _drivers != ['']
169177
170178 if with_gallium and system_has_kms_drm
@@ -385,8 +393,8 @@ if with_any_vk and (with_platform_x11 and not with_dri3)
385393 error('Vulkan drivers require dri3 for X11 support')
386394 endif
387395 if with_dri
388- if with_glx == 'disabled' and not with_egl and not with_gbm
389- error('building dri drivers require at least one windowing system')
396+ if with_glx == 'disabled' and not with_egl and not with_gbm and with_osmesa != 'classic'
397+ error('building dri drivers require at least one windowing system or classic osmesa')
390398 endif
391399 endif
392400
@@ -671,7 +679,7 @@ if _opencl != 'disabled'
671679 else
672680 dep_clc = null_dep
673681 with_gallium_opencl = false
674- with_gallium_icd = false
682+ with_opencl_icd = false
675683 endif
676684
677685 gl_pkgconfig_c_flags = []
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2802,15 +2802,16 @@ static LLVMValueRef visit_interp(struct ac_nir_context *ctx,
28022802 const nir_intrinsic_instr *instr)
28032803 {
28042804 LLVMValueRef result[4];
2805- LLVMValueRef interp_param, attr_number;
2805+ LLVMValueRef interp_param;
28062806 unsigned location;
28072807 unsigned chan;
28082808 LLVMValueRef src_c0 = NULL;
28092809 LLVMValueRef src_c1 = NULL;
28102810 LLVMValueRef src0 = NULL;
28112811
2812- nir_variable *var = nir_deref_instr_get_variable(nir_instr_as_deref(instr->src[0].ssa->parent_instr));
2813- int input_index = ctx->abi->fs_input_attr_indices[var->data.location - VARYING_SLOT_VAR0];
2812+ nir_deref_instr *deref_instr = nir_instr_as_deref(instr->src[0].ssa->parent_instr);
2813+ nir_variable *var = nir_deref_instr_get_variable(deref_instr);
2814+ int input_base = ctx->abi->fs_input_attr_indices[var->data.location - VARYING_SLOT_VAR0];
28142815 switch (instr->intrinsic) {
28152816 case nir_intrinsic_interp_deref_at_centroid:
28162817 location = INTERP_CENTROID;
@@ -2840,7 +2841,6 @@ static LLVMValueRef visit_interp(struct ac_nir_context *ctx,
28402841 src_c1 = LLVMBuildFSub(ctx->ac.builder, src_c1, halfval, "");
28412842 }
28422843 interp_param = ctx->abi->lookup_interp_param(ctx->abi, var->data.interpolation, location);
2843- attr_number = LLVMConstInt(ctx->ac.i32, input_index, false);
28442844
28452845 if (location == INTERP_CENTER) {
28462846 LLVMValueRef ij_out[2];
@@ -2878,26 +2878,65 @@ static LLVMValueRef visit_interp(struct ac_nir_context *ctx,
28782878
28792879 }
28802880
2881+ LLVMValueRef array_idx = ctx->ac.i32_0;
2882+ while(deref_instr->deref_type != nir_deref_type_var) {
2883+ if (deref_instr->deref_type == nir_deref_type_array) {
2884+ unsigned array_size = glsl_get_aoa_size(deref_instr->type);
2885+ if (!array_size)
2886+ array_size = 1;
2887+
2888+ LLVMValueRef offset;
2889+ nir_const_value *const_value = nir_src_as_const_value(deref_instr->arr.index);
2890+ if (const_value) {
2891+ offset = LLVMConstInt(ctx->ac.i32, array_size * const_value->u32[0], false);
2892+ } else {
2893+ LLVMValueRef indirect = get_src(ctx, deref_instr->arr.index);
2894+
2895+ offset = LLVMBuildMul(ctx->ac.builder, indirect,
2896+ LLVMConstInt(ctx->ac.i32, array_size, false), "");
2897+ }
2898+
2899+ array_idx = LLVMBuildAdd(ctx->ac.builder, array_idx, offset, "");
2900+ deref_instr = nir_src_as_deref(deref_instr->parent);
2901+ } else {
2902+ unreachable("Unsupported deref type");
2903+ }
2904+
2905+ }
2906+
2907+ unsigned input_array_size = glsl_get_aoa_size(var->type);
2908+ if (!input_array_size)
2909+ input_array_size = 1;
2910+
28812911 for (chan = 0; chan < 4; chan++) {
2912+ LLVMValueRef gather = LLVMGetUndef(LLVMVectorType(ctx->ac.f32, input_array_size));
28822913 LLVMValueRef llvm_chan = LLVMConstInt(ctx->ac.i32, chan, false);
28832914
2884- if (interp_param) {
2885- interp_param = LLVMBuildBitCast(ctx->ac.builder,
2915+ for (unsigned idx = 0; idx < input_array_size; ++idx) {
2916+ LLVMValueRef v, attr_number;
2917+
2918+ attr_number = LLVMConstInt(ctx->ac.i32, input_base + idx, false);
2919+ if (interp_param) {
2920+ interp_param = LLVMBuildBitCast(ctx->ac.builder,
28862921 interp_param, ctx->ac.v2f32, "");
2887- LLVMValueRef i = LLVMBuildExtractElement(
2888- ctx->ac.builder, interp_param, ctx->ac.i32_0, "");
2889- LLVMValueRef j = LLVMBuildExtractElement(
2890- ctx->ac.builder, interp_param, ctx->ac.i32_1, "");
2891-
2892- result[chan] = ac_build_fs_interp(&ctx->ac,
2893- llvm_chan, attr_number,
2894- ctx->abi->prim_mask, i, j);
2895- } else {
2896- result[chan] = ac_build_fs_interp_mov(&ctx->ac,
2897- LLVMConstInt(ctx->ac.i32, 2, false),
2898- llvm_chan, attr_number,
2899- ctx->abi->prim_mask);
2922+ LLVMValueRef i = LLVMBuildExtractElement(
2923+ ctx->ac.builder, interp_param, ctx->ac.i32_0, "");
2924+ LLVMValueRef j = LLVMBuildExtractElement(
2925+ ctx->ac.builder, interp_param, ctx->ac.i32_1, "");
2926+
2927+ v = ac_build_fs_interp(&ctx->ac, llvm_chan, attr_number,
2928+ ctx->abi->prim_mask, i, j);
2929+ } else {
2930+ v = ac_build_fs_interp_mov(&ctx->ac, LLVMConstInt(ctx->ac.i32, 2, false),
2931+ llvm_chan, attr_number, ctx->abi->prim_mask);
2932+ }
2933+
2934+ gather = LLVMBuildInsertElement(ctx->ac.builder, gather, v,
2935+ LLVMConstInt(ctx->ac.i32, idx, false), "");
29002936 }
2937+
2938+ result[chan] = LLVMBuildExtractElement(ctx->ac.builder, gather, array_idx, "");
2939+
29012940 }
29022941 return ac_build_varying_gather_values(&ctx->ac, result, instr->num_components,
29032942 var->data.location_frac);
@@ -3460,7 +3499,7 @@ static void visit_tex(struct ac_nir_context *ctx, nir_tex_instr *instr)
34603499 * It's unnecessary if the original texture format was
34613500 * Z32_FLOAT, but we don't know that here.
34623501 */
3463- if (args.compare && ctx->ac.chip_class == VI && ctx->abi->clamp_shadow_reference)
3502+ if (args.compare && ctx->ac.chip_class >= VI && ctx->abi->clamp_shadow_reference)
34643503 args.compare = ac_build_clamp(&ctx->ac, ac_to_float(&ctx->ac, args.compare));
34653504
34663505 /* pack derivatives */
@@ -3851,7 +3890,7 @@ ac_handle_shader_output_decl(struct ac_llvm_context *ctx,
38513890 }
38523891 }
38533892
3854- bool is_16bit = glsl_type_is_16bit(variable->type);
3893+ bool is_16bit = glsl_type_is_16bit(glsl_without_array(variable->type));
38553894 LLVMTypeRef type = is_16bit ? ctx->f16 : ctx->f32;
38563895 for (unsigned i = 0; i < attrib_count; ++i) {
38573896 for (unsigned chan = 0; chan < 4; chan++) {
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -525,7 +525,7 @@ VkResult radv_CreateInstance(
525525 pCreateInfo->pApplicationInfo->apiVersion != 0) {
526526 client_version = pCreateInfo->pApplicationInfo->apiVersion;
527527 } else {
528- radv_EnumerateInstanceVersion(&client_version);
528+ client_version = VK_API_VERSION_1_0;
529529 }
530530
531531 instance = vk_zalloc2(&default_alloc, pAllocator, sizeof(*instance), 8,
--- a/src/amd/vulkan/radv_nir_to_llvm.c
+++ b/src/amd/vulkan/radv_nir_to_llvm.c
@@ -256,7 +256,16 @@ get_tcs_num_patches(struct radv_shader_context *ctx)
256256 /* Make sure that the data fits in LDS. This assumes the shaders only
257257 * use LDS for the inputs and outputs.
258258 */
259- hardware_lds_size = ctx->options->chip_class >= CIK ? 65536 : 32768;
259+ hardware_lds_size = 32768;
260+
261+ /* Looks like STONEY hangs if we use more than 32 KiB LDS in a single
262+ * threadgroup, even though there is more than 32 KiB LDS.
263+ *
264+ * Test: dEQP-VK.tessellation.shader_input_output.barrier
265+ */
266+ if (ctx->options->chip_class >= CIK && ctx->options->family != CHIP_STONEY)
267+ hardware_lds_size = 65536;
268+
260269 num_patches = MIN2(num_patches, hardware_lds_size / (input_patch_size + output_patch_size));
261270 /* Make sure the output data fits in the offchip buffer */
262271 num_patches = MIN2(num_patches, (ctx->options->tess_offchip_block_dw_size * 4) / output_patch_size);
@@ -2160,7 +2169,7 @@ handle_fs_input_decl(struct radv_shader_context *ctx,
21602169
21612170 interp = lookup_interp_param(&ctx->abi, variable->data.interpolation, interp_type);
21622171 }
2163- bool is_16bit = glsl_type_is_16bit(variable->type);
2172+ bool is_16bit = glsl_type_is_16bit(glsl_without_array(variable->type));
21642173 LLVMTypeRef type = is_16bit ? ctx->ac.i16 : ctx->ac.i32;
21652174 if (interp == NULL)
21662175 interp = LLVMGetUndef(type);
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -3371,14 +3371,8 @@ radv_compute_ia_multi_vgt_param_helpers(struct radv_pipeline *pipeline,
33713371 else
33723372 ia_multi_vgt_param.primgroup_size = 128; /* recommended without a GS */
33733373
3374- ia_multi_vgt_param.partial_es_wave = false;
3375- if (pipeline->device->has_distributed_tess) {
3376- if (radv_pipeline_has_gs(pipeline)) {
3377- if (device->physical_device->rad_info.chip_class <= VI)
3378- ia_multi_vgt_param.partial_es_wave = true;
3379- }
3380- }
33813374 /* GS requirement. */
3375+ ia_multi_vgt_param.partial_es_wave = false;
33823376 if (radv_pipeline_has_gs(pipeline) && device->physical_device->rad_info.chip_class <= VI)
33833377 if (SI_GS_PER_ES / ia_multi_vgt_param.primgroup_size >= pipeline->device->gs_table_depth - 3)
33843378 ia_multi_vgt_param.partial_es_wave = true;
@@ -3424,13 +3418,8 @@ radv_compute_ia_multi_vgt_param_helpers(struct radv_pipeline *pipeline,
34243418 /* Needed for 028B6C_DISTRIBUTION_MODE != 0 */
34253419 if (device->has_distributed_tess) {
34263420 if (radv_pipeline_has_gs(pipeline)) {
3427- if (device->physical_device->rad_info.family == CHIP_TONGA ||
3428- device->physical_device->rad_info.family == CHIP_FIJI ||
3429- device->physical_device->rad_info.family == CHIP_POLARIS10 ||
3430- device->physical_device->rad_info.family == CHIP_POLARIS11 ||
3431- device->physical_device->rad_info.family == CHIP_POLARIS12 ||
3432- device->physical_device->rad_info.family == CHIP_VEGAM)
3433- ia_multi_vgt_param.partial_vs_wave = true;
3421+ if (device->physical_device->rad_info.chip_class <= VI)
3422+ ia_multi_vgt_param.partial_es_wave = true;
34343423 } else {
34353424 ia_multi_vgt_param.partial_vs_wave = true;
34363425 }
@@ -3448,6 +3437,26 @@ radv_compute_ia_multi_vgt_param_helpers(struct radv_pipeline *pipeline,
34483437 ia_multi_vgt_param.partial_vs_wave = true;
34493438 }
34503439
3440+ if (radv_pipeline_has_gs(pipeline)) {
3441+ /* On these chips there is the possibility of a hang if the
3442+ * pipeline uses a GS and partial_vs_wave is not set.
3443+ *
3444+ * This mostly does not hit 4-SE chips, as those typically set
3445+ * ia_switch_on_eoi and then partial_vs_wave is set for pipelines
3446+ * with GS due to another workaround.
3447+ *
3448+ * Reproducer: https://bugs.freedesktop.org/show_bug.cgi?id=109242
3449+ */
3450+ if (device->physical_device->rad_info.family == CHIP_TONGA ||
3451+ device->physical_device->rad_info.family == CHIP_FIJI ||
3452+ device->physical_device->rad_info.family == CHIP_POLARIS10 ||
3453+ device->physical_device->rad_info.family == CHIP_POLARIS11 ||
3454+ device->physical_device->rad_info.family == CHIP_POLARIS12 ||
3455+ device->physical_device->rad_info.family == CHIP_VEGAM) {
3456+ ia_multi_vgt_param.partial_vs_wave = true;
3457+ }
3458+ }
3459+
34513460 ia_multi_vgt_param.base =
34523461 S_028AA8_PRIMGROUP_SIZE(ia_multi_vgt_param.primgroup_size - 1) |
34533462 /* The following field was moved to VGT_SHADER_STAGES_EN in GFX9. */
--- a/src/compiler/glsl/ast_function.cpp
+++ b/src/compiler/glsl/ast_function.cpp
@@ -363,31 +363,29 @@ copy_index_derefs_to_temps(ir_instruction *ir, void *data)
363363 ir = a->array->as_dereference();
364364
365365 ir_rvalue *idx = a->array_index;
366- if (idx->as_dereference_variable()) {
367- ir_variable *var = idx->variable_referenced();
366+ ir_variable *var = idx->variable_referenced();
368367
369- /* If the index is read only it cannot change so there is no need
370- * to copy it.
371- */
372- if (var->data.read_only || var->data.memory_read_only)
373- return;
374-
375- ir_variable *tmp = new(d->mem_ctx) ir_variable(idx->type, "idx_tmp",
376- ir_var_temporary);
377- d->before_instructions->push_tail(tmp);
378-
379- ir_dereference_variable *const deref_tmp_1 =
380- new(d->mem_ctx) ir_dereference_variable(tmp);
381- ir_assignment *const assignment =
382- new(d->mem_ctx) ir_assignment(deref_tmp_1,
383- idx->clone(d->mem_ctx, NULL));
384- d->before_instructions->push_tail(assignment);
385-
386- /* Replace the array index with a dereference of the new temporary */
387- ir_dereference_variable *const deref_tmp_2 =
388- new(d->mem_ctx) ir_dereference_variable(tmp);
389- a->array_index = deref_tmp_2;
390- }
368+ /* If the index is read only it cannot change so there is no need
369+ * to copy it.
370+ */
371+ if (!var || var->data.read_only || var->data.memory_read_only)
372+ return;
373+
374+ ir_variable *tmp = new(d->mem_ctx) ir_variable(idx->type, "idx_tmp",
375+ ir_var_temporary);
376+ d->before_instructions->push_tail(tmp);
377+
378+ ir_dereference_variable *const deref_tmp_1 =
379+ new(d->mem_ctx) ir_dereference_variable(tmp);
380+ ir_assignment *const assignment =
381+ new(d->mem_ctx) ir_assignment(deref_tmp_1,
382+ idx->clone(d->mem_ctx, NULL));
383+ d->before_instructions->push_tail(assignment);
384+
385+ /* Replace the array index with a dereference of the new temporary */
386+ ir_dereference_variable *const deref_tmp_2 =
387+ new(d->mem_ctx) ir_dereference_variable(tmp);
388+ a->array_index = deref_tmp_2;
391389 }
392390 }
393391
@@ -402,7 +400,8 @@ fix_parameter(void *mem_ctx, ir_rvalue *actual, const glsl_type *formal_type,
402400 * nothing needs to be done to fix the parameter.
403401 */
404402 if (formal_type == actual->type
405- && (expr == NULL || expr->operation != ir_binop_vector_extract))
403+ && (expr == NULL || expr->operation != ir_binop_vector_extract)
404+ && actual->as_dereference_variable())
406405 return;
407406
408407 /* An array index could also be an out variable so we need to make a copy
@@ -456,7 +455,7 @@ fix_parameter(void *mem_ctx, ir_rvalue *actual, const glsl_type *formal_type,
456455 ir_dereference_variable *const deref_tmp_1 =
457456 new(mem_ctx) ir_dereference_variable(tmp);
458457 ir_assignment *const assignment =
459- new(mem_ctx) ir_assignment(deref_tmp_1, actual);
458+ new(mem_ctx) ir_assignment(deref_tmp_1, actual->clone(mem_ctx, NULL));
460459 before_instructions->push_tail(assignment);
461460 }
462461
--- a/src/compiler/glsl/lower_output_reads.cpp
+++ b/src/compiler/glsl/lower_output_reads.cpp
@@ -101,6 +101,10 @@ output_read_remover::visit(ir_dereference_variable *ir)
101101 void *var_ctx = ralloc_parent(ir->var);
102102 temp = new(var_ctx) ir_variable(ir->var->type, ir->var->name,
103103 ir_var_temporary);
104+ /* copy flags which affect arithematical precision */
105+ temp->data.invariant = ir->var->data.invariant;
106+ temp->data.precise = ir->var->data.precise;
107+ temp->data.precision = ir->var->data.precision;
104108 _mesa_hash_table_insert(replacements, ir->var, temp);
105109 ir->var->insert_after(temp);
106110 }
--- a/src/compiler/glsl/serialize.cpp
+++ b/src/compiler/glsl/serialize.cpp
@@ -764,6 +764,12 @@ get_shader_var_and_pointer_sizes(size_t *s_var_size, size_t *s_var_ptrs,
764764 sizeof(var->name);
765765 }
766766
767+enum uniform_type
768+{
769+ uniform_remapped,
770+ uniform_not_remapped
771+};
772+
767773 static void
768774 write_program_resource_data(struct blob *metadata,
769775 struct gl_shader_program *prog,
@@ -816,12 +822,19 @@ write_program_resource_data(struct blob *metadata,
816822 case GL_TESS_CONTROL_SUBROUTINE_UNIFORM:
817823 case GL_TESS_EVALUATION_SUBROUTINE_UNIFORM:
818824 case GL_UNIFORM:
819- for (unsigned i = 0; i < prog->data->NumUniformStorage; i++) {
820- if (strcmp(((gl_uniform_storage *)res->Data)->name,
821- prog->data->UniformStorage[i].name) == 0) {
822- blob_write_uint32(metadata, i);
823- break;
825+ if (((gl_uniform_storage *)res->Data)->builtin ||
826+ res->Type != GL_UNIFORM) {
827+ blob_write_uint32(metadata, uniform_not_remapped);
828+ for (unsigned i = 0; i < prog->data->NumUniformStorage; i++) {
829+ if (strcmp(((gl_uniform_storage *)res->Data)->name,
830+ prog->data->UniformStorage[i].name) == 0) {
831+ blob_write_uint32(metadata, i);
832+ break;
833+ }
824834 }
835+ } else {
836+ blob_write_uint32(metadata, uniform_remapped);
837+ blob_write_uint32(metadata, ((gl_uniform_storage *)res->Data)->remap_location);
825838 }
826839 break;
827840 case GL_ATOMIC_COUNTER_BUFFER:
@@ -906,9 +919,15 @@ read_program_resource_data(struct blob_reader *metadata,
906919 case GL_COMPUTE_SUBROUTINE_UNIFORM:
907920 case GL_TESS_CONTROL_SUBROUTINE_UNIFORM:
908921 case GL_TESS_EVALUATION_SUBROUTINE_UNIFORM:
909- case GL_UNIFORM:
910- res->Data = &prog->data->UniformStorage[blob_read_uint32(metadata)];
922+ case GL_UNIFORM: {
923+ enum uniform_type type = (enum uniform_type) blob_read_uint32(metadata);
924+ if (type == uniform_not_remapped) {
925+ res->Data = &prog->data->UniformStorage[blob_read_uint32(metadata)];
926+ } else {
927+ res->Data = prog->UniformRemapTable[blob_read_uint32(metadata)];
928+ }
911929 break;
930+ }
912931 case GL_ATOMIC_COUNTER_BUFFER:
913932 res->Data = &prog->data->AtomicBuffers[blob_read_uint32(metadata)];
914933 break;
--- a/src/compiler/nir/nir_gather_xfb_info.c
+++ b/src/compiler/nir/nir_gather_xfb_info.c
@@ -76,13 +76,13 @@ add_var_xfb_outputs(nir_xfb_info *xfb,
7676 nir_xfb_output_info *output = &xfb->outputs[xfb->output_count++];
7777
7878 output->buffer = var->data.xfb_buffer;
79- output->offset = *offset;
79+ output->offset = *offset + s * 16;
8080 output->location = *location;
8181 output->component_mask = (comp_mask >> (s * 4)) & 0xf;
8282
8383 (*location)++;
84- *offset += comp_slots * 4;
8584 }
85+ *offset += comp_slots * 4;
8686 }
8787 }
8888
--- a/src/compiler/nir/nir_opt_copy_prop_vars.c
+++ b/src/compiler/nir/nir_opt_copy_prop_vars.c
@@ -143,9 +143,19 @@ gather_vars_written(struct copy_prop_var_state *state,
143143 written->modes = nir_var_shader_out;
144144 break;
145145
146+ case nir_intrinsic_deref_atomic_add:
147+ case nir_intrinsic_deref_atomic_imin:
148+ case nir_intrinsic_deref_atomic_umin:
149+ case nir_intrinsic_deref_atomic_imax:
150+ case nir_intrinsic_deref_atomic_umax:
151+ case nir_intrinsic_deref_atomic_and:
152+ case nir_intrinsic_deref_atomic_or:
153+ case nir_intrinsic_deref_atomic_xor:
154+ case nir_intrinsic_deref_atomic_exchange:
155+ case nir_intrinsic_deref_atomic_comp_swap:
146156 case nir_intrinsic_store_deref:
147157 case nir_intrinsic_copy_deref: {
148- /* Destination in _both_ store_deref and copy_deref is src[0]. */
158+ /* Destination in all of store_deref, copy_deref and the atomics is src[0]. */
149159 nir_deref_instr *dst = nir_src_as_deref(intrin->src[0]);
150160
151161 uintptr_t mask = intrin->intrinsic == nir_intrinsic_store_deref ?
@@ -750,6 +760,19 @@ copy_prop_vars_block(struct copy_prop_var_state *state,
750760 break;
751761 }
752762
763+ case nir_intrinsic_deref_atomic_add:
764+ case nir_intrinsic_deref_atomic_imin:
765+ case nir_intrinsic_deref_atomic_umin:
766+ case nir_intrinsic_deref_atomic_imax:
767+ case nir_intrinsic_deref_atomic_umax:
768+ case nir_intrinsic_deref_atomic_and:
769+ case nir_intrinsic_deref_atomic_or:
770+ case nir_intrinsic_deref_atomic_xor:
771+ case nir_intrinsic_deref_atomic_exchange:
772+ case nir_intrinsic_deref_atomic_comp_swap:
773+ kill_aliases(copies, nir_src_as_deref(intrin->src[0]), 0xf);
774+ break;
775+
753776 default:
754777 break;
755778 }
--- a/src/gallium/drivers/etnaviv/etnaviv_context.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_context.c
@@ -60,6 +60,8 @@ etna_context_destroy(struct pipe_context *pctx)
6060 {
6161 struct etna_context *ctx = etna_context(pctx);
6262
63+ util_copy_framebuffer_state(&ctx->framebuffer_s, NULL);
64+
6365 if (ctx->primconvert)
6466 util_primconvert_destroy(ctx->primconvert);
6567
@@ -296,10 +298,10 @@ etna_draw_vbo(struct pipe_context *pctx, const struct pipe_draw_info *info)
296298 if (DBG_ENABLED(ETNA_DBG_FLUSH_ALL))
297299 pctx->flush(pctx, NULL, 0);
298300
299- if (ctx->framebuffer.cbuf)
300- etna_resource(ctx->framebuffer.cbuf->texture)->seqno++;
301- if (ctx->framebuffer.zsbuf)
302- etna_resource(ctx->framebuffer.zsbuf->texture)->seqno++;
301+ if (ctx->framebuffer_s.cbufs[0])
302+ etna_resource(ctx->framebuffer_s.cbufs[0]->texture)->seqno++;
303+ if (ctx->framebuffer_s.zsbuf)
304+ etna_resource(ctx->framebuffer_s.zsbuf->texture)->seqno++;
303305 if (info->index_size && indexbuf != info->index.resource)
304306 pipe_resource_reference(&indexbuf, NULL);
305307 }
--- a/src/gallium/drivers/etnaviv/etnaviv_internal.h
+++ b/src/gallium/drivers/etnaviv/etnaviv_internal.h
@@ -182,7 +182,6 @@ struct compiled_viewport_state {
182182
183183 /* Compiled pipe_framebuffer_state */
184184 struct compiled_framebuffer_state {
185- struct pipe_surface *cbuf, *zsbuf; /* keep reference to surfaces */
186185 uint32_t GL_MULTI_SAMPLE_CONFIG;
187186 uint32_t PE_COLOR_FORMAT;
188187 uint32_t PE_DEPTH_CONFIG;
--- a/src/gallium/drivers/etnaviv/etnaviv_state.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_state.c
@@ -37,6 +37,7 @@
3737 #include "etnaviv_surface.h"
3838 #include "etnaviv_translate.h"
3939 #include "etnaviv_util.h"
40+#include "util/u_framebuffer.h"
4041 #include "util/u_helpers.h"
4142 #include "util/u_inlines.h"
4243 #include "util/u_math.h"
@@ -130,7 +131,6 @@ etna_set_framebuffer_state(struct pipe_context *pctx,
130131 assert(res->layout & ETNA_LAYOUT_BIT_TILE); /* Cannot render to linear surfaces */
131132 etna_update_render_resource(pctx, cbuf->base.texture);
132133
133- pipe_surface_reference(&cs->cbuf, &cbuf->base);
134134 cs->PE_COLOR_FORMAT =
135135 VIVS_PE_COLOR_FORMAT_FORMAT(translate_rs_format(cbuf->base.format)) |
136136 VIVS_PE_COLOR_FORMAT_COMPONENTS__MASK |
@@ -182,7 +182,6 @@ etna_set_framebuffer_state(struct pipe_context *pctx,
182182
183183 nr_samples_color = cbuf->base.texture->nr_samples;
184184 } else {
185- pipe_surface_reference(&cs->cbuf, NULL);
186185 /* Clearing VIVS_PE_COLOR_FORMAT_COMPONENTS__MASK and
187186 * VIVS_PE_COLOR_FORMAT_OVERWRITE prevents us from overwriting the
188187 * color target */
@@ -201,7 +200,6 @@ etna_set_framebuffer_state(struct pipe_context *pctx,
201200
202201 etna_update_render_resource(pctx, zsbuf->base.texture);
203202
204- pipe_surface_reference(&cs->zsbuf, &zsbuf->base);
205203 assert(res->layout &ETNA_LAYOUT_BIT_TILE); /* Cannot render to linear surfaces */
206204
207205 uint32_t depth_format = translate_depth_format(zsbuf->base.format);
@@ -252,7 +250,6 @@ etna_set_framebuffer_state(struct pipe_context *pctx,
252250
253251 nr_samples_depth = zsbuf->base.texture->nr_samples;
254252 } else {
255- pipe_surface_reference(&cs->zsbuf, NULL);
256253 cs->PE_DEPTH_CONFIG = VIVS_PE_DEPTH_CONFIG_DEPTH_MODE_NONE;
257254 cs->PE_DEPTH_ADDR.bo = NULL;
258255 cs->PE_DEPTH_STRIDE = 0;
@@ -325,7 +322,8 @@ etna_set_framebuffer_state(struct pipe_context *pctx,
325322 */
326323 cs->PE_LOGIC_OP = VIVS_PE_LOGIC_OP_SINGLE_BUFFER(ctx->specs.single_buffer ? 3 : 0);
327324
328- ctx->framebuffer_s = *sv; /* keep copy of original structure */
325+ /* keep copy of original structure */
326+ util_copy_framebuffer_state(&ctx->framebuffer_s, sv);
329327 ctx->dirty |= ETNA_DIRTY_FRAMEBUFFER | ETNA_DIRTY_DERIVE_TS;
330328 }
331329
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1044,7 +1044,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
10441044 break;
10451045 }
10461046 case OP_MUL:
1047- if (i->dType == TYPE_F32)
1047+ if (i->dType == TYPE_F32 && !i->precise)
10481048 tryCollapseChainedMULs(i, s, imm0);
10491049
10501050 if (i->subOp == NV50_IR_SUBOP_MUL_HIGH) {
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -348,20 +348,11 @@ si_get_init_multi_vgt_param(struct si_screen *sscreen,
348348 key->u.uses_gs)
349349 partial_vs_wave = true;
350350
351- /* Needed for 028B6C_DISTRIBUTION_MODE != 0 */
351+ /* Needed for 028B6C_DISTRIBUTION_MODE != 0. (implies >= VI) */
352352 if (sscreen->has_distributed_tess) {
353353 if (key->u.uses_gs) {
354- if (sscreen->info.chip_class <= VI)
354+ if (sscreen->info.chip_class == VI)
355355 partial_es_wave = true;
356-
357- /* GPU hang workaround. */
358- if (sscreen->info.family == CHIP_TONGA ||
359- sscreen->info.family == CHIP_FIJI ||
360- sscreen->info.family == CHIP_POLARIS10 ||
361- sscreen->info.family == CHIP_POLARIS11 ||
362- sscreen->info.family == CHIP_POLARIS12 ||
363- sscreen->info.family == CHIP_VEGAM)
364- partial_vs_wave = true;
365356 } else {
366357 partial_vs_wave = true;
367358 }
@@ -417,6 +408,18 @@ si_get_init_multi_vgt_param(struct si_screen *sscreen,
417408 if (sscreen->info.max_se == 4 && !wd_switch_on_eop)
418409 ia_switch_on_eoi = true;
419410
411+ /* HW engineers suggested that PARTIAL_VS_WAVE_ON should be set
412+ * to work around a GS hang.
413+ */
414+ if (key->u.uses_gs &&
415+ (sscreen->info.family == CHIP_TONGA ||
416+ sscreen->info.family == CHIP_FIJI ||
417+ sscreen->info.family == CHIP_POLARIS10 ||
418+ sscreen->info.family == CHIP_POLARIS11 ||
419+ sscreen->info.family == CHIP_POLARIS12 ||
420+ sscreen->info.family == CHIP_VEGAM))
421+ partial_vs_wave = true;
422+
420423 /* Required by Hawaii and, for some special cases, by VI. */
421424 if (ia_switch_on_eoi &&
422425 (sscreen->info.family == CHIP_HAWAII ||
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1662,7 +1662,7 @@ static inline void si_shader_selector_key(struct pipe_context *ctx,
16621662 key->part.ps.epilog.alpha_func = si_get_alpha_test_func(sctx);
16631663
16641664 /* ps_uses_fbfetch is true only if the color buffer is bound. */
1665- if (sctx->ps_uses_fbfetch) {
1665+ if (sctx->ps_uses_fbfetch && !sctx->blitter->running) {
16661666 struct pipe_surface *cb0 = sctx->framebuffer.state.cbufs[0];
16671667 struct pipe_resource *tex = cb0->texture;
16681668
--- a/src/gallium/drivers/radeonsi/si_state_viewport.c
+++ b/src/gallium/drivers/radeonsi/si_state_viewport.c
@@ -146,6 +146,8 @@ static void si_emit_one_scissor(struct si_context *ctx,
146146 S_028254_BR_Y(final.maxy));
147147 }
148148
149+#define MAX_PA_SU_HARDWARE_SCREEN_OFFSET 8176
150+
149151 static void si_emit_guardband(struct si_context *ctx)
150152 {
151153 const struct si_state_rasterizer *rs = ctx->queued.named.rasterizer;
@@ -179,13 +181,12 @@ static void si_emit_guardband(struct si_context *ctx)
179181 int hw_screen_offset_x = (vp_as_scissor.maxx + vp_as_scissor.minx) / 2;
180182 int hw_screen_offset_y = (vp_as_scissor.maxy + vp_as_scissor.miny) / 2;
181183
182- const unsigned hw_screen_offset_max = 8176;
183184 /* SI-CI need to align the offset to an ubertile consisting of all SEs. */
184185 const unsigned hw_screen_offset_alignment =
185186 ctx->chip_class >= VI ? 16 : MAX2(ctx->screen->se_tile_repeat, 16);
186187
187- hw_screen_offset_x = CLAMP(hw_screen_offset_x, 0, hw_screen_offset_max);
188- hw_screen_offset_y = CLAMP(hw_screen_offset_y, 0, hw_screen_offset_max);
188+ hw_screen_offset_x = CLAMP(hw_screen_offset_x, 0, MAX_PA_SU_HARDWARE_SCREEN_OFFSET);
189+ hw_screen_offset_y = CLAMP(hw_screen_offset_y, 0, MAX_PA_SU_HARDWARE_SCREEN_OFFSET);
189190
190191 /* Align the screen offset by dropping the low bits. */
191192 hw_screen_offset_x &= ~(hw_screen_offset_alignment - 1);
@@ -332,6 +333,20 @@ static void si_set_viewport_states(struct pipe_context *pctx,
332333 unsigned h = scissor->maxy - scissor->miny;
333334 unsigned max_extent = MAX2(w, h);
334335
336+ unsigned center_x = (scissor->maxx + scissor->minx) / 2;
337+ unsigned center_y = (scissor->maxy + scissor->miny) / 2;
338+ unsigned max_center = MAX2(center_x, center_y);
339+
340+ /* PA_SU_HARDWARE_SCREEN_OFFSET can't center viewports whose
341+ * center start farther than MAX_PA_SU_HARDWARE_SCREEN_OFFSET.
342+ * (for example, a 1x1 viewport in the lower right corner of
343+ * 16Kx16K) Such viewports need a greater guardband, so they
344+ * have to use a worse quantization mode.
345+ */
346+ unsigned distance_off_center =
347+ MAX2(0, (int)max_center - MAX_PA_SU_HARDWARE_SCREEN_OFFSET);
348+ max_extent += distance_off_center;
349+
335350 /* Determine the best quantization mode (subpixel precision),
336351 * but also leave enough space for the guardband.
337352 *
--- a/src/gallium/drivers/swr/meson.build
+++ b/src/gallium/drivers/swr/meson.build
@@ -190,11 +190,7 @@ swr_arch_libs = []
190190 swr_arch_defines = []
191191
192192 swr_avx_args = cpp.first_supported_argument(
193- '-target-cpu=sandybridge', '-mavx', '-march=core-avx', '-tp=sandybridge',
194- prefix : '''
195- #if !defined(__AVX__)
196- # error
197- #endif ''',
193+ '-mavx', '-target-cpu=sandybridge', '-march=core-avx', '-tp=sandybridge',
198194 )
199195 if swr_avx_args == []
200196 error('Cannot find AVX support for swr. (these are required for SWR an all architectures.)')
@@ -215,18 +211,10 @@ endif
215211
216212 if with_swr_arches.contains('avx2')
217213 swr_avx2_args = cpp.first_supported_argument(
218- '-target-cpu=haswell', '-march=core-avx2', '-tp=haswell',
219- prefix : '''
220- #if !defined(__AVX2__)
221- # error
222- #endif ''',
214+ '-march=core-avx2', '-target-cpu=haswell', '-tp=haswell',
223215 )
224216 if swr_avx2_args == []
225- if cpp.has_argument(['-mavx2', '-mfma', '-mbmi2', '-mf16c'],
226- prefix : '''
227- #if !defined(__AVX2__)
228- # error
229- #endif ''')
217+ if cpp.has_argument(['-mavx2', '-mfma', '-mbmi2', '-mf16c'])
230218 swr_avx2_args = ['-mavx2', '-mfma', '-mbmi2', '-mf16c']
231219 else
232220 error('Cannot find AVX2 support for swr.')
@@ -248,11 +236,7 @@ endif
248236
249237 if with_swr_arches.contains('knl')
250238 swr_knl_args = cpp.first_supported_argument(
251- '-target-cpu=mic-knl', '-march=knl', '-xMIC-AVX512',
252- prefix : '''
253- #if !defined(__AVX512F__) || !defined(__AVX512ER__)
254- # error
255- #endif ''',
239+ '-march=knl', '-target-cpu=mic-knl', '-xMIC-AVX512',
256240 )
257241 if swr_knl_args == []
258242 error('Cannot find KNL support for swr.')
@@ -264,7 +248,7 @@ if with_swr_arches.contains('knl')
264248 [files_swr_common, files_swr_arch],
265249 cpp_args : [
266250 swr_cpp_args, swr_knl_args, '-DKNOB_ARCH=KNOB_ARCH_AVX512',
267- '-DKNOB_ARCH_KNIGHTS',
251+ '-DSIMD_ARCH_KNIGHTS',
268252 ],
269253 link_args : [ld_args_gc_sections],
270254 include_directories : [swr_incs],
@@ -276,11 +260,7 @@ endif
276260
277261 if with_swr_arches.contains('skx')
278262 swr_skx_args = cpp.first_supported_argument(
279- '-target-cpu=x86-skylake', '-march=skylake-avx512', '-xCORE-AVX512',
280- prefix : '''
281- #if !defined(__AVX512F__) || !defined(__AVX512BW__)
282- # error
283- #endif ''',
263+ '-march=skylake-avx512', '-target-cpu=x86-skylake', '-xCORE-AVX512',
284264 )
285265 if swr_skx_args == []
286266 error('Cannot find SKX support for swr.')
--- a/src/gallium/drivers/swr/swr_fence.cpp
+++ b/src/gallium/drivers/swr/swr_fence.cpp
@@ -50,7 +50,9 @@ swr_fence_cb(uint64_t userData, uint64_t userData2, uint64_t userData3)
5050 swr_fence_do_work(fence);
5151
5252 /* Correct value is in SwrSync data, and not the fence write field. */
53- fence->read = userData2;
53+ /* Contexts may not finish in order, but fence value always increases */
54+ if (fence->read < userData2)
55+ fence->read = userData2;
5456 }
5557
5658 /*
--- a/src/gallium/drivers/vc4/meson.build
+++ b/src/gallium/drivers/vc4/meson.build
@@ -81,8 +81,10 @@ files_libvc4 = files(
8181 'vc4_uniforms.c',
8282 )
8383
84+vc4_c_args = []
85+
8486 libvc4_neon = []
85-if with_asm_arch == 'arm'
87+if host_machine.cpu_family() == 'arm'
8688 libvc4_neon = static_library(
8789 'vc4_neon',
8890 'vc4_tiling_lt_neon.c',
@@ -91,12 +93,12 @@ if with_asm_arch == 'arm'
9193 ],
9294 c_args : '-mfpu=neon',
9395 )
96+ vc4_c_args += '-DUSE_ARM_ASM'
9497 endif
9598
96-simpenrose_c_args = []
9799 dep_simpenrose = dependency('simpenrose', required : false)
98100 if dep_simpenrose.found()
99- simpenrose_c_args = '-DUSE_VC4_SIMULATOR'
101+ vc4_c_args += '-DUSE_VC4_SIMULATOR'
100102 endif
101103
102104 libvc4 = static_library(
@@ -107,7 +109,7 @@ libvc4 = static_library(
107109 inc_gallium_drivers, inc_drm_uapi,
108110 ],
109111 link_with: libvc4_neon,
110- c_args : [c_vis_args, simpenrose_c_args],
112+ c_args : [c_vis_args, vc4_c_args],
111113 cpp_args : [cpp_vis_args],
112114 dependencies : [dep_simpenrose, dep_libdrm, dep_valgrind, idep_nir_headers],
113115 build_by_default : false,
--- a/src/gallium/drivers/vc4/vc4_tiling_lt.c
+++ b/src/gallium/drivers/vc4/vc4_tiling_lt.c
@@ -73,42 +73,46 @@ vc4_load_utile(void *cpu, void *gpu, uint32_t cpu_stride, uint32_t cpp)
7373 /* Load from the GPU in one shot, no interleave, to
7474 * d0-d7.
7575 */
76- "vldm %0, {q0, q1, q2, q3}\n"
76+ "vldm %[gpu], {q0, q1, q2, q3}\n"
7777 /* Store each 8-byte line to cpu-side destination,
7878 * incrementing it by the stride each time.
7979 */
80- "vst1.8 d0, [%1], %2\n"
81- "vst1.8 d1, [%1], %2\n"
82- "vst1.8 d2, [%1], %2\n"
83- "vst1.8 d3, [%1], %2\n"
84- "vst1.8 d4, [%1], %2\n"
85- "vst1.8 d5, [%1], %2\n"
86- "vst1.8 d6, [%1], %2\n"
87- "vst1.8 d7, [%1]\n"
88- :
89- : "r"(gpu), "r"(cpu), "r"(cpu_stride)
80+ "vst1.8 d0, [%[cpu]], %[cpu_stride]\n"
81+ "vst1.8 d1, [%[cpu]], %[cpu_stride]\n"
82+ "vst1.8 d2, [%[cpu]], %[cpu_stride]\n"
83+ "vst1.8 d3, [%[cpu]], %[cpu_stride]\n"
84+ "vst1.8 d4, [%[cpu]], %[cpu_stride]\n"
85+ "vst1.8 d5, [%[cpu]], %[cpu_stride]\n"
86+ "vst1.8 d6, [%[cpu]], %[cpu_stride]\n"
87+ "vst1.8 d7, [%[cpu]]\n"
88+ : [cpu] "+r"(cpu)
89+ : [gpu] "r"(gpu),
90+ [cpu_stride] "r"(cpu_stride)
9091 : "q0", "q1", "q2", "q3");
9192 } else {
9293 assert(gpu_stride == 16);
94+ void *cpu2 = cpu + 8;
9395 __asm__ volatile (
9496 /* Load from the GPU in one shot, no interleave, to
9597 * d0-d7.
9698 */
97- "vldm %0, {q0, q1, q2, q3};\n"
99+ "vldm %[gpu], {q0, q1, q2, q3};\n"
98100 /* Store each 16-byte line in 2 parts to the cpu-side
99101 * destination. (vld1 can only store one d-register
100102 * at a time).
101103 */
102- "vst1.8 d0, [%1], %3\n"
103- "vst1.8 d1, [%2], %3\n"
104- "vst1.8 d2, [%1], %3\n"
105- "vst1.8 d3, [%2], %3\n"
106- "vst1.8 d4, [%1], %3\n"
107- "vst1.8 d5, [%2], %3\n"
108- "vst1.8 d6, [%1]\n"
109- "vst1.8 d7, [%2]\n"
110- :
111- : "r"(gpu), "r"(cpu), "r"(cpu + 8), "r"(cpu_stride)
104+ "vst1.8 d0, [%[cpu]], %[cpu_stride]\n"
105+ "vst1.8 d1, [%[cpu2]],%[cpu_stride]\n"
106+ "vst1.8 d2, [%[cpu]], %[cpu_stride]\n"
107+ "vst1.8 d3, [%[cpu2]],%[cpu_stride]\n"
108+ "vst1.8 d4, [%[cpu]], %[cpu_stride]\n"
109+ "vst1.8 d5, [%[cpu2]],%[cpu_stride]\n"
110+ "vst1.8 d6, [%[cpu]]\n"
111+ "vst1.8 d7, [%[cpu2]]\n"
112+ : [cpu] "+r"(cpu),
113+ [cpu2] "+r"(cpu2)
114+ : [gpu] "r"(gpu),
115+ [cpu_stride] "r"(cpu_stride)
112116 : "q0", "q1", "q2", "q3");
113117 }
114118 #elif defined (PIPE_ARCH_AARCH64)
@@ -117,42 +121,46 @@ vc4_load_utile(void *cpu, void *gpu, uint32_t cpu_stride, uint32_t cpp)
117121 /* Load from the GPU in one shot, no interleave, to
118122 * d0-d7.
119123 */
120- "ld1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%0]\n"
124+ "ld1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%[gpu]]\n"
121125 /* Store each 8-byte line to cpu-side destination,
122126 * incrementing it by the stride each time.
123127 */
124- "st1 {v0.D}[0], [%1], %2\n"
125- "st1 {v0.D}[1], [%1], %2\n"
126- "st1 {v1.D}[0], [%1], %2\n"
127- "st1 {v1.D}[1], [%1], %2\n"
128- "st1 {v2.D}[0], [%1], %2\n"
129- "st1 {v2.D}[1], [%1], %2\n"
130- "st1 {v3.D}[0], [%1], %2\n"
131- "st1 {v3.D}[1], [%1]\n"
132- :
133- : "r"(gpu), "r"(cpu), "r"(cpu_stride)
128+ "st1 {v0.D}[0], [%[cpu]], %[cpu_stride]\n"
129+ "st1 {v0.D}[1], [%[cpu]], %[cpu_stride]\n"
130+ "st1 {v1.D}[0], [%[cpu]], %[cpu_stride]\n"
131+ "st1 {v1.D}[1], [%[cpu]], %[cpu_stride]\n"
132+ "st1 {v2.D}[0], [%[cpu]], %[cpu_stride]\n"
133+ "st1 {v2.D}[1], [%[cpu]], %[cpu_stride]\n"
134+ "st1 {v3.D}[0], [%[cpu]], %[cpu_stride]\n"
135+ "st1 {v3.D}[1], [%[cpu]]\n"
136+ : [cpu] "+r"(cpu)
137+ : [gpu] "r"(gpu),
138+ [cpu_stride] "r"(cpu_stride)
134139 : "v0", "v1", "v2", "v3");
135140 } else {
136141 assert(gpu_stride == 16);
142+ void *cpu2 = cpu + 8;
137143 __asm__ volatile (
138144 /* Load from the GPU in one shot, no interleave, to
139145 * d0-d7.
140146 */
141- "ld1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%0]\n"
147+ "ld1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%[gpu]]\n"
142148 /* Store each 16-byte line in 2 parts to the cpu-side
143149 * destination. (vld1 can only store one d-register
144150 * at a time).
145151 */
146- "st1 {v0.D}[0], [%1], %3\n"
147- "st1 {v0.D}[1], [%2], %3\n"
148- "st1 {v1.D}[0], [%1], %3\n"
149- "st1 {v1.D}[1], [%2], %3\n"
150- "st1 {v2.D}[0], [%1], %3\n"
151- "st1 {v2.D}[1], [%2], %3\n"
152- "st1 {v3.D}[0], [%1]\n"
153- "st1 {v3.D}[1], [%2]\n"
154- :
155- : "r"(gpu), "r"(cpu), "r"(cpu + 8), "r"(cpu_stride)
152+ "st1 {v0.D}[0], [%[cpu]], %[cpu_stride]\n"
153+ "st1 {v0.D}[1], [%[cpu2]],%[cpu_stride]\n"
154+ "st1 {v1.D}[0], [%[cpu]], %[cpu_stride]\n"
155+ "st1 {v1.D}[1], [%[cpu2]],%[cpu_stride]\n"
156+ "st1 {v2.D}[0], [%[cpu]], %[cpu_stride]\n"
157+ "st1 {v2.D}[1], [%[cpu2]],%[cpu_stride]\n"
158+ "st1 {v3.D}[0], [%[cpu]]\n"
159+ "st1 {v3.D}[1], [%[cpu2]]\n"
160+ : [cpu] "+r"(cpu),
161+ [cpu2] "+r"(cpu2)
162+ : [gpu] "r"(gpu),
163+ [cpu_stride] "r"(cpu_stride)
156164 : "v0", "v1", "v2", "v3");
157165 }
158166 #else
@@ -174,40 +182,44 @@ vc4_store_utile(void *gpu, void *cpu, uint32_t cpu_stride, uint32_t cpp)
174182 /* Load each 8-byte line from cpu-side source,
175183 * incrementing it by the stride each time.
176184 */
177- "vld1.8 d0, [%1], %2\n"
178- "vld1.8 d1, [%1], %2\n"
179- "vld1.8 d2, [%1], %2\n"
180- "vld1.8 d3, [%1], %2\n"
181- "vld1.8 d4, [%1], %2\n"
182- "vld1.8 d5, [%1], %2\n"
183- "vld1.8 d6, [%1], %2\n"
184- "vld1.8 d7, [%1]\n"
185+ "vld1.8 d0, [%[cpu]], %[cpu_stride]\n"
186+ "vld1.8 d1, [%[cpu]], %[cpu_stride]\n"
187+ "vld1.8 d2, [%[cpu]], %[cpu_stride]\n"
188+ "vld1.8 d3, [%[cpu]], %[cpu_stride]\n"
189+ "vld1.8 d4, [%[cpu]], %[cpu_stride]\n"
190+ "vld1.8 d5, [%[cpu]], %[cpu_stride]\n"
191+ "vld1.8 d6, [%[cpu]], %[cpu_stride]\n"
192+ "vld1.8 d7, [%[cpu]]\n"
185193 /* Load from the GPU in one shot, no interleave, to
186194 * d0-d7.
187195 */
188- "vstm %0, {q0, q1, q2, q3}\n"
189- :
190- : "r"(gpu), "r"(cpu), "r"(cpu_stride)
196+ "vstm %[gpu], {q0, q1, q2, q3}\n"
197+ : [cpu] "r"(cpu)
198+ : [gpu] "r"(gpu),
199+ [cpu_stride] "r"(cpu_stride)
191200 : "q0", "q1", "q2", "q3");
192201 } else {
193202 assert(gpu_stride == 16);
203+ void *cpu2 = cpu + 8;
194204 __asm__ volatile (
195205 /* Load each 16-byte line in 2 parts from the cpu-side
196206 * destination. (vld1 can only store one d-register
197207 * at a time).
198208 */
199- "vld1.8 d0, [%1], %3\n"
200- "vld1.8 d1, [%2], %3\n"
201- "vld1.8 d2, [%1], %3\n"
202- "vld1.8 d3, [%2], %3\n"
203- "vld1.8 d4, [%1], %3\n"
204- "vld1.8 d5, [%2], %3\n"
205- "vld1.8 d6, [%1]\n"
206- "vld1.8 d7, [%2]\n"
209+ "vld1.8 d0, [%[cpu]], %[cpu_stride]\n"
210+ "vld1.8 d1, [%[cpu2]],%[cpu_stride]\n"
211+ "vld1.8 d2, [%[cpu]], %[cpu_stride]\n"
212+ "vld1.8 d3, [%[cpu2]],%[cpu_stride]\n"
213+ "vld1.8 d4, [%[cpu]], %[cpu_stride]\n"
214+ "vld1.8 d5, [%[cpu2]],%[cpu_stride]\n"
215+ "vld1.8 d6, [%[cpu]]\n"
216+ "vld1.8 d7, [%[cpu2]]\n"
207217 /* Store to the GPU in one shot, no interleave. */
208- "vstm %0, {q0, q1, q2, q3}\n"
209- :
210- : "r"(gpu), "r"(cpu), "r"(cpu + 8), "r"(cpu_stride)
218+ "vstm %[gpu], {q0, q1, q2, q3}\n"
219+ : [cpu] "+r"(cpu),
220+ [cpu2] "+r"(cpu2)
221+ : [gpu] "r"(gpu),
222+ [cpu_stride] "r"(cpu_stride)
211223 : "q0", "q1", "q2", "q3");
212224 }
213225 #elif defined (PIPE_ARCH_AARCH64)
@@ -216,38 +228,42 @@ vc4_store_utile(void *gpu, void *cpu, uint32_t cpu_stride, uint32_t cpp)
216228 /* Load each 8-byte line from cpu-side source,
217229 * incrementing it by the stride each time.
218230 */
219- "ld1 {v0.D}[0], [%1], %2\n"
220- "ld1 {v0.D}[1], [%1], %2\n"
221- "ld1 {v1.D}[0], [%1], %2\n"
222- "ld1 {v1.D}[1], [%1], %2\n"
223- "ld1 {v2.D}[0], [%1], %2\n"
224- "ld1 {v2.D}[1], [%1], %2\n"
225- "ld1 {v3.D}[0], [%1], %2\n"
226- "ld1 {v3.D}[1], [%1]\n"
231+ "ld1 {v0.D}[0], [%[cpu]], %[cpu_stride]\n"
232+ "ld1 {v0.D}[1], [%[cpu]], %[cpu_stride]\n"
233+ "ld1 {v1.D}[0], [%[cpu]], %[cpu_stride]\n"
234+ "ld1 {v1.D}[1], [%[cpu]], %[cpu_stride]\n"
235+ "ld1 {v2.D}[0], [%[cpu]], %[cpu_stride]\n"
236+ "ld1 {v2.D}[1], [%[cpu]], %[cpu_stride]\n"
237+ "ld1 {v3.D}[0], [%[cpu]], %[cpu_stride]\n"
238+ "ld1 {v3.D}[1], [%[cpu]]\n"
227239 /* Store to the GPU in one shot, no interleave. */
228- "st1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%0]\n"
229- :
230- : "r"(gpu), "r"(cpu), "r"(cpu_stride)
240+ "st1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%[gpu]]\n"
241+ : [cpu] "+r"(cpu)
242+ : [gpu] "r"(gpu),
243+ [cpu_stride] "r"(cpu_stride)
231244 : "v0", "v1", "v2", "v3");
232245 } else {
233246 assert(gpu_stride == 16);
247+ void *cpu2 = cpu + 8;
234248 __asm__ volatile (
235249 /* Load each 16-byte line in 2 parts from the cpu-side
236250 * destination. (vld1 can only store one d-register
237251 * at a time).
238252 */
239- "ld1 {v0.D}[0], [%1], %3\n"
240- "ld1 {v0.D}[1], [%2], %3\n"
241- "ld1 {v1.D}[0], [%1], %3\n"
242- "ld1 {v1.D}[1], [%2], %3\n"
243- "ld1 {v2.D}[0], [%1], %3\n"
244- "ld1 {v2.D}[1], [%2], %3\n"
245- "ld1 {v3.D}[0], [%1]\n"
246- "ld1 {v3.D}[1], [%2]\n"
253+ "ld1 {v0.D}[0], [%[cpu]], %[cpu_stride]\n"
254+ "ld1 {v0.D}[1], [%[cpu2]],%[cpu_stride]\n"
255+ "ld1 {v1.D}[0], [%[cpu]], %[cpu_stride]\n"
256+ "ld1 {v1.D}[1], [%[cpu2]],%[cpu_stride]\n"
257+ "ld1 {v2.D}[0], [%[cpu]], %[cpu_stride]\n"
258+ "ld1 {v2.D}[1], [%[cpu2]],%[cpu_stride]\n"
259+ "ld1 {v3.D}[0], [%[cpu]]\n"
260+ "ld1 {v3.D}[1], [%[cpu2]]\n"
247261 /* Store to the GPU in one shot, no interleave. */
248- "st1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%0]\n"
249- :
250- : "r"(gpu), "r"(cpu), "r"(cpu + 8), "r"(cpu_stride)
262+ "st1 {v0.2d, v1.2d, v2.2d, v3.2d}, [%[gpu]]\n"
263+ : [cpu] "+r"(cpu),
264+ [cpu2] "+r"(cpu2)
265+ : [gpu] "r"(gpu),
266+ [cpu_stride] "r"(cpu_stride)
251267 : "v0", "v1", "v2", "v3");
252268 }
253269 #else
--- a/src/gallium/include/state_tracker/drisw_api.h
+++ b/src/gallium/include/state_tracker/drisw_api.h
@@ -20,7 +20,7 @@ struct drisw_loader_funcs
2020 void (*put_image2) (struct dri_drawable *dri_drawable,
2121 void *data, int x, int y, unsigned width, unsigned height, unsigned stride);
2222 void (*put_image_shm) (struct dri_drawable *dri_drawable,
23- int shmid, char *shmaddr, unsigned offset,
23+ int shmid, char *shmaddr, unsigned offset, unsigned offset_x,
2424 int x, int y, unsigned width, unsigned height, unsigned stride);
2525 };
2626
--- a/src/gallium/state_trackers/dri/drisw.c
+++ b/src/gallium/state_trackers/dri/drisw.c
@@ -79,15 +79,21 @@ put_image2(__DRIdrawable *dPriv, void *data, int x, int y,
7979
8080 static inline void
8181 put_image_shm(__DRIdrawable *dPriv, int shmid, char *shmaddr,
82- unsigned offset, int x, int y,
82+ unsigned offset, unsigned offset_x, int x, int y,
8383 unsigned width, unsigned height, unsigned stride)
8484 {
8585 __DRIscreen *sPriv = dPriv->driScreenPriv;
8686 const __DRIswrastLoaderExtension *loader = sPriv->swrast_loader;
8787
88- loader->putImageShm(dPriv, __DRI_SWRAST_IMAGE_OP_SWAP,
89- x, y, width, height, stride,
90- shmid, shmaddr, offset, dPriv->loaderPrivate);
88+ /* if we have the newer interface, don't have to add the offset_x here. */
89+ if (loader->base.version > 4 && loader->putImageShm2)
90+ loader->putImageShm2(dPriv, __DRI_SWRAST_IMAGE_OP_SWAP,
91+ x, y, width, height, stride,
92+ shmid, shmaddr, offset, dPriv->loaderPrivate);
93+ else
94+ loader->putImageShm(dPriv, __DRI_SWRAST_IMAGE_OP_SWAP,
95+ x, y, width, height, stride,
96+ shmid, shmaddr, offset + offset_x, dPriv->loaderPrivate);
9197 }
9298
9399 static inline void
@@ -179,12 +185,13 @@ drisw_put_image2(struct dri_drawable *drawable,
179185 static inline void
180186 drisw_put_image_shm(struct dri_drawable *drawable,
181187 int shmid, char *shmaddr, unsigned offset,
188+ unsigned offset_x,
182189 int x, int y, unsigned width, unsigned height,
183190 unsigned stride)
184191 {
185192 __DRIdrawable *dPriv = drawable->dPriv;
186193
187- put_image_shm(dPriv, shmid, shmaddr, offset, x, y, width, height, stride);
194+ put_image_shm(dPriv, shmid, shmaddr, offset, offset_x, x, y, width, height, stride);
188195 }
189196
190197 static inline void
--- a/src/gallium/state_trackers/nine/surface9.c
+++ b/src/gallium/state_trackers/nine/surface9.c
@@ -668,6 +668,19 @@ NineSurface9_CopyMemToDefault( struct NineSurface9 *This,
668668 From->data, From->stride,
669669 0, /* depth = 1 */
670670 &src_box);
671+ if (From->texture == D3DRTYPE_TEXTURE) {
672+ struct NineTexture9 *tex =
673+ NineTexture9(From->base.base.container);
674+ /* D3DPOOL_SYSTEMMEM with buffer content passed
675+ * from the user: execute the upload right now.
676+ * It is possible it is enough to delay upload
677+ * until the surface refcount is 0, but the
678+ * bind refcount may not be 0, and thus the dtor
679+ * is not executed (and doesn't trigger the
680+ * pending_uploads_counter check). */
681+ if (!tex->managed_buffer)
682+ nine_csmt_process(This->base.base.device);
683+ }
671684
672685 if (This->data_conversion)
673686 (void) util_format_translate(This->format_conversion,
--- a/src/gallium/state_trackers/vdpau/meson.build
+++ b/src/gallium/state_trackers/vdpau/meson.build
@@ -18,13 +18,20 @@
1818 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
1919 # SOFTWARE.
2020
21+VDPAU_MAJOR = 1
22+VDPAU_MINOR = 0
23+
2124 libvdpau_st = static_library(
2225 'vdpau_st',
2326 files(
2427 'bitmap.c', 'decode.c', 'device.c', 'ftab.c', 'htab.c', 'mixer.c',
2528 'output.c', 'preemption.c', 'presentation.c', 'query.c', 'surface.c',
2629 ),
27- c_args : [c_vis_args, '-DVER_MAJOR=1', '-DVER_MINOR=0'],
30+ c_args : [
31+ c_vis_args,
32+ '-DVER_MAJOR=@0@'.format(VDPAU_MAJOR),
33+ '-DVER_MINOR=@0@'.format(VDPAU_MINOR),
34+ ],
2835 include_directories : [
2936 inc_include, inc_src, inc_util, inc_gallium, inc_gallium_aux,
3037 ],
--- a/src/gallium/targets/vdpau/meson.build
+++ b/src/gallium/targets/vdpau/meson.build
@@ -54,13 +54,14 @@ libvdpau_gallium = shared_library(
5454 dep_thread, driver_r300, driver_r600, driver_radeonsi, driver_nouveau,
5555 ],
5656 link_depends : vdpau_link_depends,
57+ soversion : '@0@.@1@.0'.format(VDPAU_MAJOR, VDPAU_MINOR),
5758 )
5859 foreach d : [[with_gallium_r300, 'r300'],
5960 [with_gallium_r600, 'r600'],
6061 [with_gallium_radeonsi, 'radeonsi'],
6162 [with_gallium_nouveau, 'nouveau']]
6263 if d[0]
63- vdpau_drivers += 'libvdpau_@0@.so.1.0.0'.format(d[1])
64+ vdpau_drivers += 'libvdpau_@0@.so.@1@.@2@.0'.format(d[1], VDPAU_MAJOR, VDPAU_MINOR)
6465 endif
6566 endforeach
6667
--- a/src/gallium/winsys/sw/dri/dri_sw_winsys.c
+++ b/src/gallium/winsys/sw/dri/dri_sw_winsys.c
@@ -244,15 +244,20 @@ dri_sw_displaytarget_display(struct sw_winsys *ws,
244244 unsigned width, height, x = 0, y = 0;
245245 unsigned blsize = util_format_get_blocksize(dri_sw_dt->format);
246246 unsigned offset = 0;
247+ unsigned offset_x = 0;
247248 char *data = dri_sw_dt->data;
248-
249+ bool is_shm = dri_sw_dt->shmid != -1;
249250 /* Set the width to 'stride / cpp'.
250251 *
251252 * PutImage correctly clips to the width of the dst drawable.
252253 */
253254 if (box) {
254- offset = (dri_sw_dt->stride * box->y) + box->x * blsize;
255+ offset = dri_sw_dt->stride * box->y;
256+ offset_x = box->x * blsize;
255257 data += offset;
258+ /* don't add x offset for shm, the put_image_shm will deal with it */
259+ if (!is_shm)
260+ data += offset_x;
256261 x = box->x;
257262 y = box->y;
258263 width = box->width;
@@ -262,8 +267,8 @@ dri_sw_displaytarget_display(struct sw_winsys *ws,
262267 height = dri_sw_dt->height;
263268 }
264269
265- if (dri_sw_dt->shmid != -1) {
266- dri_sw_ws->lf->put_image_shm(dri_drawable, dri_sw_dt->shmid, dri_sw_dt->data, offset,
270+ if (is_shm) {
271+ dri_sw_ws->lf->put_image_shm(dri_drawable, dri_sw_dt->shmid, dri_sw_dt->data, offset, offset_x,
267272 x, y, width, height, dri_sw_dt->stride);
268273 return;
269274 }
--- a/src/gallium/winsys/vc4/drm/vc4_drm_winsys.c
+++ b/src/gallium/winsys/vc4/drm/vc4_drm_winsys.c
@@ -37,5 +37,5 @@ vc4_drm_screen_create(int fd)
3737 struct pipe_screen *
3838 vc4_drm_screen_create_renderonly(struct renderonly *ro)
3939 {
40- return vc4_screen_create(fcntl(ro->gpu_fd, F_DUPFD_CLOEXEC, 3), ro);
40+ return vc4_screen_create(ro->gpu_fd, ro);
4141 }
--- a/src/glx/drisw_glx.c
+++ b/src/glx/drisw_glx.c
@@ -201,7 +201,8 @@ bytes_per_line(unsigned pitch_bits, unsigned mul)
201201
202202 static void
203203 swrastXPutImage(__DRIdrawable * draw, int op,
204- int x, int y, int w, int h, int stride,
204+ int srcx, int srcy, int x, int y,
205+ int w, int h, int stride,
205206 int shmid, char *data, void *loaderPrivate)
206207 {
207208 struct drisw_drawable *pdp = loaderPrivate;
@@ -235,12 +236,12 @@ swrastXPutImage(__DRIdrawable * draw, int op,
235236 if (pdp->shminfo.shmid >= 0) {
236237 ximage->width = ximage->bytes_per_line / ((ximage->bits_per_pixel + 7)/ 8);
237238 ximage->height = h;
238- XShmPutImage(dpy, drawable, gc, ximage, 0, 0, x, y, w, h, False);
239+ XShmPutImage(dpy, drawable, gc, ximage, srcx, srcy, x, y, w, h, False);
239240 XSync(dpy, False);
240241 } else {
241242 ximage->width = w;
242243 ximage->height = h;
243- XPutImage(dpy, drawable, gc, ximage, 0, 0, x, y, w, h);
244+ XPutImage(dpy, drawable, gc, ximage, srcx, srcy, x, y, w, h);
244245 }
245246 ximage->data = NULL;
246247 }
@@ -254,7 +255,21 @@ swrastPutImageShm(__DRIdrawable * draw, int op,
254255 struct drisw_drawable *pdp = loaderPrivate;
255256
256257 pdp->shminfo.shmaddr = shmaddr;
257- swrastXPutImage(draw, op, x, y, w, h, stride, shmid,
258+ swrastXPutImage(draw, op, 0, 0, x, y, w, h, stride, shmid,
259+ shmaddr + offset, loaderPrivate);
260+}
261+
262+static void
263+swrastPutImageShm2(__DRIdrawable * draw, int op,
264+ int x, int y,
265+ int w, int h, int stride,
266+ int shmid, char *shmaddr, unsigned offset,
267+ void *loaderPrivate)
268+{
269+ struct drisw_drawable *pdp = loaderPrivate;
270+
271+ pdp->shminfo.shmaddr = shmaddr;
272+ swrastXPutImage(draw, op, x, 0, x, y, w, h, stride, shmid,
258273 shmaddr + offset, loaderPrivate);
259274 }
260275
@@ -263,7 +278,7 @@ swrastPutImage2(__DRIdrawable * draw, int op,
263278 int x, int y, int w, int h, int stride,
264279 char *data, void *loaderPrivate)
265280 {
266- swrastXPutImage(draw, op, x, y, w, h, stride, -1,
281+ swrastXPutImage(draw, op, 0, 0, x, y, w, h, stride, -1,
267282 data, loaderPrivate);
268283 }
269284
@@ -272,7 +287,7 @@ swrastPutImage(__DRIdrawable * draw, int op,
272287 int x, int y, int w, int h,
273288 char *data, void *loaderPrivate)
274289 {
275- swrastXPutImage(draw, op, x, y, w, h, 0, -1,
290+ swrastXPutImage(draw, op, 0, 0, x, y, w, h, 0, -1,
276291 data, loaderPrivate);
277292 }
278293
@@ -340,7 +355,7 @@ swrastGetImageShm(__DRIdrawable * read,
340355 }
341356
342357 static const __DRIswrastLoaderExtension swrastLoaderExtension_shm = {
343- .base = {__DRI_SWRAST_LOADER, 4 },
358+ .base = {__DRI_SWRAST_LOADER, 5 },
344359
345360 .getDrawableInfo = swrastGetDrawableInfo,
346361 .putImage = swrastPutImage,
@@ -349,6 +364,7 @@ static const __DRIswrastLoaderExtension swrastLoaderExtension_shm = {
349364 .getImage2 = swrastGetImage2,
350365 .putImageShm = swrastPutImageShm,
351366 .getImageShm = swrastGetImageShm,
367+ .putImageShm2 = swrastPutImageShm2,
352368 };
353369
354370 static const __DRIextension *loader_extensions_shm[] = {
--- a/src/intel/vulkan/anv_descriptor_set.c
+++ b/src/intel/vulkan/anv_descriptor_set.c
@@ -94,7 +94,22 @@ VkResult anv_CreateDescriptorSetLayout(
9494 uint32_t immutable_sampler_count = 0;
9595 for (uint32_t j = 0; j < pCreateInfo->bindingCount; j++) {
9696 max_binding = MAX2(max_binding, pCreateInfo->pBindings[j].binding);
97- if (pCreateInfo->pBindings[j].pImmutableSamplers)
97+
98+ /* From the Vulkan 1.1.97 spec for VkDescriptorSetLayoutBinding:
99+ *
100+ * "If descriptorType specifies a VK_DESCRIPTOR_TYPE_SAMPLER or
101+ * VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER type descriptor, then
102+ * pImmutableSamplers can be used to initialize a set of immutable
103+ * samplers. [...] If descriptorType is not one of these descriptor
104+ * types, then pImmutableSamplers is ignored.
105+ *
106+ * We need to be careful here and only parse pImmutableSamplers if we
107+ * have one of the right descriptor types.
108+ */
109+ VkDescriptorType desc_type = pCreateInfo->pBindings[j].descriptorType;
110+ if ((desc_type == VK_DESCRIPTOR_TYPE_SAMPLER ||
111+ desc_type == VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER) &&
112+ pCreateInfo->pBindings[j].pImmutableSamplers)
98113 immutable_sampler_count += pCreateInfo->pBindings[j].descriptorCount;
99114 }
100115
@@ -153,6 +168,12 @@ VkResult anv_CreateDescriptorSetLayout(
153168 if (binding == NULL)
154169 continue;
155170
171+ /* We temporarily stashed the pointer to the binding in the
172+ * immutable_samplers pointer. Now that we've pulled it back out
173+ * again, we reset immutable_samplers to NULL.
174+ */
175+ set_layout->binding[b].immutable_samplers = NULL;
176+
156177 if (binding->descriptorCount == 0)
157178 continue;
158179
@@ -170,6 +191,15 @@ VkResult anv_CreateDescriptorSetLayout(
170191 set_layout->binding[b].stage[s].sampler_index = sampler_count[s];
171192 sampler_count[s] += binding->descriptorCount;
172193 }
194+
195+ if (binding->pImmutableSamplers) {
196+ set_layout->binding[b].immutable_samplers = samplers;
197+ samplers += binding->descriptorCount;
198+
199+ for (uint32_t i = 0; i < binding->descriptorCount; i++)
200+ set_layout->binding[b].immutable_samplers[i] =
201+ anv_sampler_from_handle(binding->pImmutableSamplers[i]);
202+ }
173203 break;
174204 default:
175205 break;
@@ -221,17 +251,6 @@ VkResult anv_CreateDescriptorSetLayout(
221251 break;
222252 }
223253
224- if (binding->pImmutableSamplers) {
225- set_layout->binding[b].immutable_samplers = samplers;
226- samplers += binding->descriptorCount;
227-
228- for (uint32_t i = 0; i < binding->descriptorCount; i++)
229- set_layout->binding[b].immutable_samplers[i] =
230- anv_sampler_from_handle(binding->pImmutableSamplers[i]);
231- } else {
232- set_layout->binding[b].immutable_samplers = NULL;
233- }
234-
235254 set_layout->shader_stages |= binding->stageFlags;
236255 }
237256
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -980,9 +980,12 @@ void anv_GetPhysicalDeviceProperties(
980980 const uint32_t max_samplers = (devinfo->gen >= 8 || devinfo->is_haswell) ?
981981 128 : 16;
982982
983+ const uint32_t max_images = devinfo->gen < 9 ? MAX_GEN8_IMAGES : MAX_IMAGES;
984+
983985 VkSampleCountFlags sample_counts =
984986 isl_device_get_sample_counts(&pdevice->isl_dev);
985987
988+
986989 VkPhysicalDeviceLimits limits = {
987990 .maxImageDimension1D = (1 << 14),
988991 .maxImageDimension2D = (1 << 14),
@@ -1002,7 +1005,7 @@ void anv_GetPhysicalDeviceProperties(
10021005 .maxPerStageDescriptorUniformBuffers = 64,
10031006 .maxPerStageDescriptorStorageBuffers = 64,
10041007 .maxPerStageDescriptorSampledImages = max_samplers,
1005- .maxPerStageDescriptorStorageImages = 64,
1008+ .maxPerStageDescriptorStorageImages = max_images,
10061009 .maxPerStageDescriptorInputAttachments = 64,
10071010 .maxPerStageResources = 250,
10081011 .maxDescriptorSetSamplers = 6 * max_samplers, /* number of stages * maxPerStageDescriptorSamplers */
@@ -1011,7 +1014,7 @@ void anv_GetPhysicalDeviceProperties(
10111014 .maxDescriptorSetStorageBuffers = 6 * 64, /* number of stages * maxPerStageDescriptorStorageBuffers */
10121015 .maxDescriptorSetStorageBuffersDynamic = MAX_DYNAMIC_BUFFERS / 2,
10131016 .maxDescriptorSetSampledImages = 6 * max_samplers, /* number of stages * maxPerStageDescriptorSampledImages */
1014- .maxDescriptorSetStorageImages = 6 * 64, /* number of stages * maxPerStageDescriptorStorageImages */
1017+ .maxDescriptorSetStorageImages = 6 * max_images, /* number of stages * maxPerStageDescriptorStorageImages */
10151018 .maxDescriptorSetInputAttachments = 256,
10161019 .maxVertexInputAttributes = MAX_VBS,
10171020 .maxVertexInputBindings = MAX_VBS,
--- a/src/intel/vulkan/anv_nir.h
+++ b/src/intel/vulkan/anv_nir.h
@@ -40,7 +40,8 @@ bool anv_nir_lower_multiview(nir_shader *shader, uint32_t view_mask);
4040 bool anv_nir_lower_ycbcr_textures(nir_shader *shader,
4141 struct anv_pipeline_layout *layout);
4242
43-void anv_nir_apply_pipeline_layout(struct anv_pipeline *pipeline,
43+void anv_nir_apply_pipeline_layout(const struct anv_physical_device *pdevice,
44+ bool robust_buffer_access,
4445 struct anv_pipeline_layout *layout,
4546 nir_shader *shader,
4647 struct brw_stage_prog_data *prog_data,
--- a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
+++ b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
@@ -428,7 +428,8 @@ setup_vec4_uniform_value(uint32_t *params, uint32_t offset, unsigned n)
428428 }
429429
430430 void
431-anv_nir_apply_pipeline_layout(struct anv_pipeline *pipeline,
431+anv_nir_apply_pipeline_layout(const struct anv_physical_device *pdevice,
432+ bool robust_buffer_access,
432433 struct anv_pipeline_layout *layout,
433434 nir_shader *shader,
434435 struct brw_stage_prog_data *prog_data,
@@ -439,7 +440,7 @@ anv_nir_apply_pipeline_layout(struct anv_pipeline *pipeline,
439440 struct apply_pipeline_layout_state state = {
440441 .shader = shader,
441442 .layout = layout,
442- .add_bounds_checks = pipeline->device->robust_buffer_access,
443+ .add_bounds_checks = robust_buffer_access,
443444 };
444445
445446 void *mem_ctx = ralloc_context(NULL);
@@ -518,8 +519,8 @@ anv_nir_apply_pipeline_layout(struct anv_pipeline *pipeline,
518519 }
519520 }
520521
521- if (map->image_count > 0) {
522- assert(map->image_count <= MAX_IMAGES);
522+ if (map->image_count > 0 && pdevice->compiler->devinfo->gen < 9) {
523+ assert(map->image_count <= MAX_GEN8_IMAGES);
523524 assert(shader->num_uniforms == prog_data->nr_params * 4);
524525 state.first_image_uniform = shader->num_uniforms;
525526 uint32_t *param = brw_stage_prog_data_add_params(prog_data,
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -532,7 +532,9 @@ anv_pipeline_lower_nir(struct anv_pipeline *pipeline,
532532
533533 /* Apply the actual pipeline layout to UBOs, SSBOs, and textures */
534534 if (layout) {
535- anv_nir_apply_pipeline_layout(pipeline, layout, nir, prog_data,
535+ anv_nir_apply_pipeline_layout(&pipeline->device->instance->physicalDevice,
536+ pipeline->device->robust_buffer_access,
537+ layout, nir, prog_data,
536538 &stage->bind_map);
537539 }
538540
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -157,7 +157,8 @@ struct gen_l3_config;
157157 #define MAX_SCISSORS 16
158158 #define MAX_PUSH_CONSTANTS_SIZE 128
159159 #define MAX_DYNAMIC_BUFFERS 16
160-#define MAX_IMAGES 8
160+#define MAX_IMAGES 64
161+#define MAX_GEN8_IMAGES 8
161162 #define MAX_PUSH_DESCRIPTORS 32 /* Minimum requirement */
162163
163164 /* The kernel relocation API has a limitation of a 32-bit delta value
@@ -1874,7 +1875,7 @@ struct anv_push_constants {
18741875 uint32_t base_work_group_id[3];
18751876
18761877 /* Image data for image_load_store on pre-SKL */
1877- struct brw_image_param images[MAX_IMAGES];
1878+ struct brw_image_param images[MAX_GEN8_IMAGES];
18781879 };
18791880
18801881 struct anv_dynamic_state {
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -1998,6 +1998,7 @@ emit_binding_table(struct anv_cmd_buffer *cmd_buffer,
19981998 gl_shader_stage stage,
19991999 struct anv_state *bt_state)
20002000 {
2001+ const struct gen_device_info *devinfo = &cmd_buffer->device->info;
20012002 struct anv_subpass *subpass = cmd_buffer->state.subpass;
20022003 struct anv_cmd_pipeline_state *pipe_state;
20032004 struct anv_pipeline *pipeline;
@@ -2055,7 +2056,8 @@ emit_binding_table(struct anv_cmd_buffer *cmd_buffer,
20552056 if (map->surface_count == 0)
20562057 goto out;
20572058
2058- if (map->image_count > 0) {
2059+ /* We only use push constant space for images before gen9 */
2060+ if (map->image_count > 0 && devinfo->gen < 9) {
20592061 VkResult result =
20602062 anv_cmd_buffer_ensure_push_constant_field(cmd_buffer, stage, images);
20612063 if (result != VK_SUCCESS)
@@ -2168,11 +2170,15 @@ emit_binding_table(struct anv_cmd_buffer *cmd_buffer,
21682170 surface_state = sstate.state;
21692171 assert(surface_state.alloc_size);
21702172 add_surface_state_relocs(cmd_buffer, sstate);
2173+ if (devinfo->gen < 9) {
2174+ assert(image < MAX_GEN8_IMAGES);
2175+ struct brw_image_param *image_param =
2176+ &cmd_buffer->state.push_constants[stage]->images[image];
21712177
2172- struct brw_image_param *image_param =
2173- &cmd_buffer->state.push_constants[stage]->images[image++];
2174-
2175- *image_param = desc->image_view->planes[binding->plane].storage_image_param;
2178+ *image_param =
2179+ desc->image_view->planes[binding->plane].storage_image_param;
2180+ }
2181+ image++;
21762182 break;
21772183 }
21782184
@@ -2217,11 +2223,14 @@ emit_binding_table(struct anv_cmd_buffer *cmd_buffer,
22172223 assert(surface_state.alloc_size);
22182224 add_surface_reloc(cmd_buffer, surface_state,
22192225 desc->buffer_view->address);
2226+ if (devinfo->gen < 9) {
2227+ assert(image < MAX_GEN8_IMAGES);
2228+ struct brw_image_param *image_param =
2229+ &cmd_buffer->state.push_constants[stage]->images[image];
22202230
2221- struct brw_image_param *image_param =
2222- &cmd_buffer->state.push_constants[stage]->images[image++];
2223-
2224- *image_param = desc->buffer_view->storage_image_param;
2231+ *image_param = desc->buffer_view->storage_image_param;
2232+ }
2233+ image++;
22252234 break;
22262235
22272236 default:
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -1273,12 +1273,20 @@ dri3_alloc_render_buffer(struct loader_dri3_drawable *draw, unsigned int format,
12731273
12741274 free(mod_reply);
12751275
1276- buffer->image = draw->ext->image->createImageWithModifiers(draw->dri_screen,
1277- width, height,
1278- format,
1279- modifiers,
1280- count,
1281- buffer);
1276+ /* don't use createImageWithModifiers() if we have no
1277+ * modifiers, other things depend on the use flags when
1278+ * there are no modifiers to know that a buffer can be
1279+ * shared.
1280+ */
1281+ if (modifiers) {
1282+ buffer->image = draw->ext->image->createImageWithModifiers(draw->dri_screen,
1283+ width, height,
1284+ format,
1285+ modifiers,
1286+ count,
1287+ buffer);
1288+ }
1289+
12821290 free(modifiers);
12831291 }
12841292 #endif
--- a/src/mesa/state_tracker/st_manager.c
+++ b/src/mesa/state_tracker/st_manager.c
@@ -1071,7 +1071,12 @@ st_api_make_current(struct st_api *stapi, struct st_context_iface *stctxi,
10711071 st_framebuffers_purge(st);
10721072 }
10731073 else {
1074+ GET_CURRENT_CONTEXT(ctx);
1075+
10741076 ret = _mesa_make_current(NULL, NULL, NULL);
1077+
1078+ if (ctx)
1079+ st_framebuffers_purge(ctx->st);
10751080 }
10761081
10771082 return ret;
Show on old repository browser