<div dir="ltr">On 2 December 2013 11:39, Francisco Jerez <span dir="ltr"><<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Change brw_untyped_atomic() and brw_untyped_surface_read() to take the<br> surface index as a register instead of a constant, construct the<br> message descriptor dynamically by OR'ing the surface index and other<br> descriptor bits together and use the non-immediate variant of SEND to<br> submit the surface message.<br> ---<br> src/mesa/drivers/dri/i965/brw_eu.h | 18 +-<br> src/mesa/drivers/dri/i965/brw_eu_emit.c | 200 +++++++++++++++--------<br> src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 7 +-<br> src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 7 +-<br> 4 files changed, 147 insertions(+), 85 deletions(-)<br></blockquote><div><br></div><div>There's something a bit hacky about the way brw_set_message_descriptor() has always worked, and IMHO this patch makes that hackiness a lot more troublesome.<br> </div><div><br></div><div>The hackiness is: instead of computing the message descriptor we want as an int32 and then passing it to brw_set_src1() using brw_imm_d(), brw_set_message_descriptor() would pass brw_imm_d(0) to brw_set_src1(), and then it would poke the proper message descriptor bits directly into the brw_instruction::bits3 (which holds the immediate value used by the instruction).<br> <br></div><div>Previous to this patch, that hackiness was confined to just SEND instructions. But with this patch, brw_load_indirect_message_descriptor() now does the same sort of hack to MOV and OR instructions in order to set the msg_length, response_length, and header_present fields of the message descriptor that's being dynamically computed.<br> <br></div><div>I would much rather if we first refactored the code to deal with message descriptors in a non-hacky way. That is, instead of setting the immediate value to 0 and then poking in the message descriptor bits, first compute the correct message descriptor as an int32_t, and then store it in the SEND instruction using brw_set_src1(). As part of this refactor, we would move the message descriptor bitfield definitions out of brw_instruction::bits3 and into their own independent union.<br> <br></div><div>Then, in this patch, instead of having brw_load_indirect_message_descriptor() poke the constant parts of the message descriptor into the OR or MOV instruction, it could just use the new union to set the msg_length, response_length, and header_present fields of the message descriptor, and then pass the resulting int32 value to brw_MOV() or brw_OR() via brw_imm_d(). I think the resulting code would be a lot easier to understand and maintain.<br> <br></div><div>Additional comments below, though I'm not sure if they're all relevant considering the refactor I'm suggesting above.<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h<br> index a6a65ca..45b421b 100644<br> --- a/src/mesa/drivers/dri/i965/brw_eu.h<br> +++ b/src/mesa/drivers/dri/i965/brw_eu.h<br> @@ -360,20 +360,20 @@ void brw_CMP(struct brw_compile *p,<br> <br> void<br> brw_untyped_atomic(struct brw_compile *p,<br> - struct brw_reg dest,<br> + struct brw_reg dst,<br> struct brw_reg mrf,<br> - GLuint atomic_op,<br> - GLuint bind_table_index,<br> - GLuint msg_length,<br> - GLuint response_length);<br> + struct brw_reg surface,<br> + unsigned atomic_op,<br> + unsigned msg_length,<br> + bool response_expected);<br> <br> void<br> brw_untyped_surface_read(struct brw_compile *p,<br> - struct brw_reg dest,<br> + struct brw_reg dst,<br> struct brw_reg mrf,<br> - GLuint bind_table_index,<br> - GLuint msg_length,<br> - GLuint response_length);<br> + struct brw_reg surface,<br> + unsigned msg_length,<br> + unsigned num_channels);<br> <br> /***********************************************************************<br> * brw_eu_util.c:<br> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c<br> index cc093e0..b94a6d1 100644<br> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c<br> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c<br> @@ -2527,23 +2527,87 @@ brw_svb_write(struct brw_compile *p,<br> send_commit_msg); /* send_commit_msg */<br> }<br> <br> +static struct brw_instruction *<br> +brw_load_indirect_message_descriptor(struct brw_compile *p,<br> + struct brw_reg dst,<br> + struct brw_reg src,<br> + unsigned msg_length,<br> + unsigned response_length,<br> + bool header_present)<br> +{<br> + struct brw_instruction *insn;<br> +<br> + brw_push_insn_state(p);<br> + brw_set_access_mode(p, BRW_ALIGN_1);<br> + brw_set_mask_control(p, BRW_MASK_DISABLE);<br> + brw_set_predicate_control(p, BRW_PREDICATE_NONE);<br> +<br> + if (src.file == BRW_IMMEDIATE_VALUE) {<br> + insn = brw_MOV(p, dst, brw_imm_ud(src.dw1.ud));<br></blockquote><div><br></div><div>Why not just brw_MOV(p, dst, retype(src, BRW_REGISTER_TYPE_UD))?<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + } else {<br> + struct brw_reg tmp = suboffset(vec1(retype(src, BRW_REGISTER_TYPE_UD)),<br> + BRW_GET_SWZ(src.dw1.bits.swizzle, 0));<br> + insn = brw_OR(p, dst, tmp, brw_imm_ud(0));<br> + }<br> +<br> + insn->bits3.generic_gen5.msg_length = msg_length;<br> + insn->bits3.generic_gen5.response_length = response_length;<br> + insn->bits3.generic_gen5.header_present = header_present;<br> +<br> + brw_pop_insn_state(p);<br> +<br> + return insn;<br> +}<br> +<br> +static struct brw_instruction *<br> +brw_send_indirect_message(struct brw_compile *p,<br> + unsigned sfid,<br> + struct brw_reg dst,<br> + struct brw_reg mrf,<br> + struct brw_reg desc)<br> +{<br> + /* Due to a hardware limitation the message descriptor desc MUST be<br> + * stored in a0.0. That means that there's only room for one<br> + * descriptor and the surface indices of different channels in the<br> + * same SIMD thread cannot diverge. That's OK for the moment<br> + * because OpenGL requires image (and atomic counter) array<br> + * indexing to be dynamically uniform.<br> + */<br></blockquote><div><br></div><div>A few points about this:<br></div><div><br>1. I would expect a comment like this to be followed by an assertion that verifies that desc refers to address register 0.<br><br></div> <div>2. Since the phrase "dynamically uniform" doesn't appear in ARB_shader_image_load_store, a spec reference would be really good here. Here's what I found:<br><br></div><div>GLSL 4.20 requires image (and atomic counter) array indexing to be dynamically uniform. Further, in the "changes from revision 9 of Version 4.20" section, it says "Correct inadvertently broad indexing by restricting indexes of images and indexes of uniform blocks to being dynamically-uniform integral expressions. This correction also applies to earlier releases for uniform blocks (4.00 and 4.10)." It seems reasonable to assume that this correction should be applied to all implementations of ARB_shader_image_load_store regardless of GLSL version.<br> <br>3. Even with that clarified, I have a concern. Consider the following shader:<br><br></div><div>uniform int i;<br>uniform image2D images[4];<br><br></div><div>void main() {<br></div><div> if (int(gl_FragCoord.x) % 2 != 0) {<br> </div><div> vec4 foo = imageLoad(images[i + 1], ivec2(0, 0));<br> ...<br> }<br> ...<br>}<br><br></div><div>Is the expression "i + 1" (inside the "if" statement) allowed? It is not clear to me from the GLSL spec whether it's intended to be dynamically uniform or not. If it is intended to be allowed, then we have a problem, because our code generator will only evaluate "i + 1" for fragments that have int(gl_FragCoord.x) % 2 != 0. It will leave garbage in the register for any other fragments. So when we compute the message descriptor for the image load, if the fragment in SIMD channel 0 doesn't satisfy int(gl_fragCoord.x) % 2 != 0, a0.0 will contain garbage, and the image load will fail.<br> <br></div><div>I have two ideas to address this:<br><br></div><div>(a) generate code that selects the first active SIMD channel and copies it into a0.0. Unfortunately this requires a nontrivial number of instructions. The best I can think of is something like this (assuming the value we want to load is in register N):<br> <br></div><div> cmp.e.f0 (8) null tmp<8;8,1>UD tmp<8;8,1>UD (set f0 bit in all active channels)<br></div><div> fbl (1) tmp f0 {NoMask} (find first active channel)<br></div><div> mul (1) tmp tmp 4 {NoMask} (compute byte offset of first active channel within a register<br> </div><div> add (1) a0 tmp <32*N> {NoMask} (set a0 to point to first active channel of register N)<br></div><div> mov (1) tmp g[a0] {NoMask} (load tmp with the first active channel of register N)<br><br></div><div> (b) during compilation, keep track of which expressions are dynamically uniform. When we emit code to compute a dynamically uniform value, emit it with the {NoMask} option, so that all channels are computed.<br><br>The advantage of (a) is that it requires touching less Mesa code. The advantages of (b) are that it generates a more compact shader executable, and it paves the way for more potential future optimizations (e.g. we could use up less register space for dynamically uniform values, since we don't have to compute them separately for each channel, and that would ease register pressure).<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + struct brw_instruction *insn = next_insn(p, BRW_OPCODE_SEND);<br> +<br> + brw_set_dest(p, insn, retype(dst, BRW_REGISTER_TYPE_UD));<br> + brw_set_src0(p, insn, retype(mrf, BRW_REGISTER_TYPE_UD));<br> + brw_set_src1(p, insn, retype(desc, BRW_REGISTER_TYPE_UD));<br> +<br> + /* On Gen6+ Message target/SFID goes in bits 27:24 of the header */<br> + insn->header.destreg__conditionalmod = sfid;<br> +<br> + return insn;<br> +}<br> +<br> +static unsigned<br> +brw_surface_payload_size(struct brw_compile *p,<br> + unsigned num_channels,<br> + bool has_simd4x2,<br> + bool has_simd16)<br></blockquote><div><br></div><div>Some comments on this function would be helpful. It wasn't clear on first reading that has_simd4x2 and has_simd16 refer to the capabilities of the hardware (e.g. has_simd4x2 is true iff the hardware has simd4x2 support for the message being sent). Also, it wasn't clear that this function is computing the proper response length of the SEND message.</div> <div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> +{<br> + if (has_simd4x2 && p->current->header.access_mode == BRW_ALIGN_16)<br> + return 1;<br> + else if (has_simd16 && p->compressed)<br> + return 2 * num_channels;<br> + else<br> + return num_channels;<br> +}<br> +<br> static void<br> brw_set_dp_untyped_atomic_message(struct brw_compile *p,<br> struct brw_instruction *insn,<br> - GLuint atomic_op,<br> - GLuint bind_table_index,<br> - GLuint msg_length,<br> - GLuint response_length,<br> - bool header_present)<br> + unsigned atomic_op,<br> + bool response_expected)<br> {<br> if (p->brw->is_haswell) {<br> - brw_set_message_descriptor(p, insn, HSW_SFID_DATAPORT_DATA_CACHE_1,<br> - msg_length, response_length,<br> - header_present, false);<br> -<br> -<br> - if (insn->header.access_mode == BRW_ALIGN_1) {<br> - if (insn->header.execution_size != BRW_EXECUTE_16)<br> + if (p->current->header.access_mode == BRW_ALIGN_1) {<br> + if (!p->compressed)<br> insn->bits3.ud |= 1 << 12; /* SIMD8 mode */<br> <br> insn->bits3.gen7_dp.msg_type =<br> @@ -2554,95 +2618,90 @@ brw_set_dp_untyped_atomic_message(struct brw_compile *p,<br> }<br> <br> } else {<br> - brw_set_message_descriptor(p, insn, GEN7_SFID_DATAPORT_DATA_CACHE,<br> - msg_length, response_length,<br> - header_present, false);<br> -<br> insn->bits3.gen7_dp.msg_type = GEN7_DATAPORT_DC_UNTYPED_ATOMIC_OP;<br> <br> - if (insn->header.execution_size != BRW_EXECUTE_16)<br> + if (!p->compressed)<br> insn->bits3.ud |= 1 << 12; /* SIMD8 mode */<br> }<br> <br> - if (response_length)<br> + if (response_expected)<br> insn->bits3.ud |= 1 << 13; /* Return data expected */<br> <br> - insn->bits3.gen7_dp.binding_table_index = bind_table_index;<br> insn->bits3.ud |= atomic_op << 8;<br> }<br> <br> void<br> brw_untyped_atomic(struct brw_compile *p,<br> - struct brw_reg dest,<br> + struct brw_reg dst,<br> struct brw_reg mrf,<br> - GLuint atomic_op,<br> - GLuint bind_table_index,<br> - GLuint msg_length,<br> - GLuint response_length) {<br> - struct brw_instruction *insn = brw_next_insn(p, BRW_OPCODE_SEND);<br> + struct brw_reg surface,<br> + unsigned atomic_op,<br> + unsigned msg_length,<br> + bool response_expected)<br> +{<br> + const unsigned sfid = (p->brw->is_haswell ?<br> + HSW_SFID_DATAPORT_DATA_CACHE_1 :<br> + GEN7_SFID_DATAPORT_DATA_CACHE);<br> + const bool header_present = p->current->header.access_mode == BRW_ALIGN_1;<br> + struct brw_reg desc = retype(brw_address_reg(0), BRW_REGISTER_TYPE_UD);<br> + struct brw_instruction *insn;<br> +<br> + insn = brw_load_indirect_message_descriptor(<br> + p, desc, surface, msg_length,<br> + brw_surface_payload_size(p, response_expected, p->brw->is_haswell, true),<br> + header_present);<br></blockquote><div><br></div><div>Passing surface directly into brw_load_indirect_message_descriptor isn't safe, because the value in the "surface" register could potentially be garbage, and brw_load_indirect_message_descriptor() forms the message descriptor by OR'ing in additional flags. That means that any bits in the message descriptor could potentially become set, leading to a GPU hang. Consider a shader like this:<br> <br></div><div>uniform int i;<br>uniform image2D images[4];<br><br><div>void main() {<br></div> vec4 foo = imageLoad(images[i], ivec2(0, 0));<br> ...<br>}<br><br></div><div>If the value of "i" is out of range, then images[i] will cause a garbage value to be loaded into the "surface" register. <br> </div><div><br></div><div>To fix this, I'd recommend emitting an AND instruction to select just the lower 8 bits (the binding table index) from the surface register.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> - brw_set_dest(p, insn, retype(dest, BRW_REGISTER_TYPE_UD));<br> - brw_set_src0(p, insn, retype(mrf, BRW_REGISTER_TYPE_UD));<br> - brw_set_src1(p, insn, brw_imm_d(0));<br> brw_set_dp_untyped_atomic_message(<br> - p, insn, atomic_op, bind_table_index, msg_length, response_length,<br> - insn->header.access_mode == BRW_ALIGN_1);<br> + p, insn, atomic_op, response_expected);<br> +<br> + brw_send_indirect_message(p, sfid, dst, mrf, desc);<br> }<br> <br> static void<br> brw_set_dp_untyped_surface_read_message(struct brw_compile *p,<br> struct brw_instruction *insn,<br> - GLuint bind_table_index,<br> - GLuint msg_length,<br> - GLuint response_length,<br> - bool header_present)<br> + unsigned num_channels)<br> {<br> - const unsigned dispatch_width =<br> - (insn->header.execution_size == BRW_EXECUTE_16 ? 16 : 8);<br> - const unsigned num_channels = response_length / (dispatch_width / 8);<br> -<br> - if (p->brw->is_haswell) {<br> - brw_set_message_descriptor(p, insn, HSW_SFID_DATAPORT_DATA_CACHE_1,<br> - msg_length, response_length,<br> - header_present, false);<br> -<br> - insn->bits3.gen7_dp.msg_type = HSW_DATAPORT_DC_PORT1_UNTYPED_SURFACE_READ;<br> - } else {<br> - brw_set_message_descriptor(p, insn, GEN7_SFID_DATAPORT_DATA_CACHE,<br> - msg_length, response_length,<br> - header_present, false);<br> -<br> - insn->bits3.gen7_dp.msg_type = GEN7_DATAPORT_DC_UNTYPED_SURFACE_READ;<br> - }<br> + insn->bits3.gen7_dp.msg_type = (p->brw->is_haswell ?<br> + HSW_DATAPORT_DC_PORT1_UNTYPED_SURFACE_READ :<br> + GEN7_DATAPORT_DC_UNTYPED_SURFACE_READ);<br> <br> - if (insn->header.access_mode == BRW_ALIGN_1) {<br> - if (dispatch_width == 16)<br> + if (p->current->header.access_mode == BRW_ALIGN_1) {<br> + if (p->compressed)<br> insn->bits3.ud |= 1 << 12; /* SIMD16 mode */<br> else<br> insn->bits3.ud |= 2 << 12; /* SIMD8 mode */<br> + } else {<br> + insn->bits3.ud |= 0 << 12; /* SIMD4x2 mode */<br> }<br> <br> - insn->bits3.gen7_dp.binding_table_index = bind_table_index;<br> -<br> /* Set mask of 32-bit channels to drop. */<br> insn->bits3.ud |= (0xf & (0xf << num_channels)) << 8;<br> }<br> <br> void<br> brw_untyped_surface_read(struct brw_compile *p,<br> - struct brw_reg dest,<br> + struct brw_reg dst,<br> struct brw_reg mrf,<br> - GLuint bind_table_index,<br> - GLuint msg_length,<br> - GLuint response_length)<br> + struct brw_reg surface,<br> + unsigned msg_length,<br> + unsigned num_channels)<br> {<br> - struct brw_instruction *insn = next_insn(p, BRW_OPCODE_SEND);<br> + const unsigned sfid = (p->brw->is_haswell ? HSW_SFID_DATAPORT_DATA_CACHE_1 :<br> + GEN7_SFID_DATAPORT_DATA_CACHE);<br> + const bool header_present = p->current->header.access_mode == BRW_ALIGN_1;<br> + struct brw_reg desc = retype(brw_address_reg(0), BRW_REGISTER_TYPE_UD);<br> + struct brw_instruction *insn;<br> +<br> + insn = brw_load_indirect_message_descriptor(<br> + p, desc, surface, msg_length,<br> + brw_surface_payload_size(p, num_channels, true, true),<br> + header_present);<br></blockquote><div><br></div><div>The same situation with the "surface" register applies here.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> - brw_set_dest(p, insn, retype(dest, BRW_REGISTER_TYPE_UD));<br> - brw_set_src0(p, insn, retype(mrf, BRW_REGISTER_TYPE_UD));<br> brw_set_dp_untyped_surface_read_message(<br> - p, insn, bind_table_index, msg_length, response_length,<br> - insn->header.access_mode == BRW_ALIGN_1);<br> + p, insn, num_channels);<br> +<br> + brw_send_indirect_message(p, sfid, dst, mrf, desc);<br> }<br> <br> /**<br> @@ -2681,8 +2740,13 @@ void brw_shader_time_add(struct brw_compile *p,<br> BRW_ARF_NULL, 0));<br> brw_set_src0(p, send, brw_vec1_reg(payload.file,<br> <a href="http://payload.nr" target="_blank">payload.nr</a>, 0));<br> - brw_set_dp_untyped_atomic_message(p, send, BRW_AOP_ADD, surf_index,<br> - 2 /* message length */,<br> - 0 /* response length */,<br> - false /* header present */);<br> + brw_set_src1(p, send, brw_imm_ud(0));<br> + brw_set_dp_untyped_atomic_message(p, send, BRW_AOP_ADD, false);<br> +<br> + /* On Gen6+ Message target/SFID goes in bits 27:24 of the header */<br> + send->header.destreg__conditionalmod =<br> + (p->brw->is_haswell ? HSW_SFID_DATAPORT_DATA_CACHE_1 :<br> + GEN7_SFID_DATAPORT_DATA_CACHE);<br> + send->bits3.generic_gen5.msg_length = 2;<br> + send->bits3.ud |= surf_index;<br> }<br> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp<br> index 9c2c318..4eb651f 100644<br> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp<br> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp<br> @@ -1265,8 +1265,8 @@ fs_generator::generate_untyped_atomic(fs_inst *inst, struct brw_reg dst,<br> surf_index.type == BRW_REGISTER_TYPE_UD);<br> <br> brw_untyped_atomic(p, dst, brw_message_reg(inst->base_mrf),<br> - atomic_op.dw1.ud, surf_index.dw1.ud,<br> - inst->mlen, dispatch_width / 8);<br> + surf_index, atomic_op.dw1.ud,<br> + inst->mlen, true);<br> <br> brw_mark_surface_used(&c->prog_data.base, surf_index.dw1.ud);<br> }<br> @@ -1279,8 +1279,7 @@ fs_generator::generate_untyped_surface_read(fs_inst *inst, struct brw_reg dst,<br> surf_index.type == BRW_REGISTER_TYPE_UD);<br> <br> brw_untyped_surface_read(p, dst, brw_message_reg(inst->base_mrf),<br> - surf_index.dw1.ud,<br> - inst->mlen, dispatch_width / 8);<br> + surf_index, inst->mlen, 1);<br> <br> brw_mark_surface_used(&c->prog_data.base, surf_index.dw1.ud);<br> }<br> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp<br> index 83a2a27..3ac45a9 100644<br> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp<br> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp<br> @@ -859,8 +859,8 @@ vec4_generator::generate_untyped_atomic(vec4_instruction *inst,<br> surf_index.type == BRW_REGISTER_TYPE_UD);<br> <br> brw_untyped_atomic(p, dst, brw_message_reg(inst->base_mrf),<br> - atomic_op.dw1.ud, surf_index.dw1.ud,<br> - inst->mlen, 1);<br> + surf_index, atomic_op.dw1.ud,<br> + inst->mlen, true);<br> <br> brw_mark_surface_used(&prog_data->base, surf_index.dw1.ud);<br> }<br> @@ -874,8 +874,7 @@ vec4_generator::generate_untyped_surface_read(vec4_instruction *inst,<br> surf_index.type == BRW_REGISTER_TYPE_UD);<br> <br> brw_untyped_surface_read(p, dst, brw_message_reg(inst->base_mrf),<br> - surf_index.dw1.ud,<br> - inst->mlen, 1);<br> + surf_index, inst->mlen, 1);<br> <br> brw_mark_surface_used(&prog_data->base, surf_index.dw1.ud);<br> }<br> <span><font color="#888888">--<br> 1.8.3.4<br> <br> _______________________________________________<br> mesa-dev mailing list<br> <a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a><br> <a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev" target="_blank">http://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br> </font></span></blockquote></div><br></div></div>