drivers/gpu/drm/i915/i915_cmd_parser.c:808:23: error: not an lvalue
drivers/gpu/drm/i915/i915_cmd_parser.c:811:23: error: not an lvalue
drivers/gpu/drm/i915/i915_cmd_parser.c:814:23: error: not an lvalue
drivers/gpu/drm/i915/i915_cmd_parser.c:808:23: error: not an lvalue
drivers/gpu/drm/i915/i915_cmd_parser.c:811:23: error: not an lvalue
drivers/gpu/drm/i915/i915_cmd_parser.c:814:23: error: not an lvalue
drivers/gpu/drm/i915/i915_cmd_parser.c:808:23: error: not an lvalue
drivers/gpu/drm/i915/i915_cmd_parser.c:811:23: error: not an lvalue
drivers/gpu/drm/i915/i915_cmd_parser.c:814:23: error: not an lvalue
drivers/gpu/drm/i915/i915_cmd_parser.c:808:23: error: not an lvalue
drivers/gpu/drm/i915/i915_cmd_parser.c:811:23: error: not an lvalue
drivers/gpu/drm/i915/i915_cmd_parser.c:814:23: error: not an lvalue
If we move the shift into each case not only do we kill the warning from
smatch, but we shrink the code slightly:
text data bss dec hex filename
1267906 20587 3168 1291661 13b58d before
1267890 20587 3168 1291645 13b57d after
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171107154055.19460-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Sometimes we know we are the only user of the bo, but since we take a
protective pin_pages early on, an attempt to change the vmap on the
object is denied because it is busy. i915_gem_object_pin_map() cannot
tell from our single pin_count if the operation is safe. Instead we must
pass that information down from the caller in the manner of
I915_MAP_OVERRIDE.
This issue has existed from the introduction of the mapping, but was
never noticed as the only place where this conflict might happen is for
cached kernel buffers (such as allocated by i915_gem_batch_pool_get()).
Until recently there was only a single user (the cmdparser) so no
conflicts ever occurred. However, we now use it to allocate batches for
different operations (using MAP_WC on !llc for writes) in addition to the
existing shadow batch (using MAP_WB for reads).
We could either keep both mappings cached, or use a different write
mechanism if we detect a MAP_WB already exists (i.e. clflush
afterwards), but as we haven't seen this issue in the wild (it requires
hitting the GPU reloc path in addition to the cmdparser) for simplicity
just allow the mappings to be recreated.
v2: Include the i915_MAP_OVERRIDE bit in the enum so the compiler knows
about all the valid values.
Fixes: 7dd4f6729f ("drm/i915: Async GPU relocation processing")
Testcase: igt/gem_lut_handle # byt, completely by accident
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170828104631.8606-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Rebrand the current (pointer | bits) pack/unpack utility macros as
explicit bit twiddling for PAGE_SIZE so that we can use the more
flexible underlying macros for different bits.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-4-chris@chris-wilson.co.uk
We want to refer to the index of the engine consistently throughout the
userspace ABI. We already have such an index through the execbuffer
engine specifier, that needs to be able to refer to each engine
specifically, so rename it the index to uabi_id to reflect its
generality beyond execbuf.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170411124306.15448-1-chris@chris-wilson.co.uk
We only need to clflush those cachelines that we have validated to be
read by the GPU. Userspace typically fills the batch length in
correctly, the exceptions tend to be explicit tests within igt.
v2: Use ptr_mask_bits() to make Mika happy
v3: cmd is not advanced on MI_BBE, so make sure to include an extra
dword in the clflush.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170310115518.13832-1-chris@chris-wilson.co.uk
In order to silence sparse:
../drivers/gpu/drm/i915/i915_gpu_error.c:200:39: warning: Using plain integer as NULL pointer
add a helper to check whether we have sse4.1 and that the desired
alignment is valid for acceleration.
v2: Explain the macros and split the two use cases between
i915_has_memcpy_from_wc() and i915_can_memcpy_from_wc().
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-1-chris@chris-wilson.co.uk
As i915.enable_cmd_parser is an unsafe option, make it read-only at
runtime. Now that it is constant, we can use the value determined during
initialisation as to whether we need the cmdparser at execbuffer time.
v2: Remove the inline for its single user, it is clear enough (and
shorter) without!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161124125851.6615-1-chris@chris-wilson.co.uk
No sense in keeping the cmd_descriptor and cmd_table structs in
i915_drv.h, now that they are no longer referenced externally.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1479942147-9837-1-git-send-email-matthew.auld@intel.com
Doing cmd_header >> 29 to extract our 3-bit client value where we know
cmd_header is a u32 shouldn't then also require the use of a mask. So
remove the redundant operation and get rid of INSTR_CLIENT_MASK now that
there are no longer any users.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1479163174-29686-1-git-send-email-matthew.auld@intel.com
Being able to program OACONTROL from a non-privileged batch buffer is
not sufficient to be able to configure the OA unit. This was originally
allowed to help enable Mesa to expose OA counters via the
INTEL_performance_query extension, but the current implementation based
on programming OACONTROL via a batch buffer isn't able to report useable
data without a more complete OA unit configuration. Mesa handles the
possibility that writes to OACONTROL may not be allowed and so only
advertises the extension after explicitly testing that a write to
OACONTROL succeeds. Based on this; removing OACONTROL from the whitelist
should be ok for userspace.
Removing this simplifies adding a new kernel api for configuring the OA
unit without needing to consider the possibility that userspace might
trample on OACONTROL state which we'd like to start managing within
the kernel instead. In particular running any Mesa based GL application
currently results in clearing OACONTROL when initializing which would
disable the capturing of metrics.
v2:
This bumps the command parser version from 8 to 9, as the change is
visible to userspace.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161108125148.25007-1-robert@sixbynine.org
check_cmd() is checking whether a command adheres to certain
restrictions that ensure it's safe to execute within a privileged batch
buffer. Returning false implies a privilege problem, not that the
command is invalid.
The distinction makes the difference between allowing the buffer to be
executed as an unprivileged batch buffer or returning an EINVAL error to
userspace without executing anything.
In a case where userspace may want to test whether it can successfully
write to a register that needs privileges the distinction may be
important and an EINVAL error may be considered fatal.
In particular this is currently true for Mesa, which includes a test for
whether OACONTROL can be written too, but Mesa treats any error when
flushing a batch buffer as fatal, calling exit(1).
As it is currently Mesa can gracefully handle a failure to write to
OACONTROL if the command parser is disabled, but if we were to remove
OACONTROL from the parser's whitelist then the returned EINVAL would
break Mesa applications as they attempt an OACONTROL write.
This bumps the command parser version from 7 to 8, as the change is
visible to userspace.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-4-robert@sixbynine.org
OACONTROL changes quite a bit for gen8, with some bits split out into a
per-context OACTXCONTROL register. Rename now before adding more gen7 OA
registers
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-3-robert@sixbynine.org
The plan is to make obtaining the backing storage for the object avoid
struct_mutex (i.e. use its own locking). The first step is to update the
API so that normal users only call pin/unpin whilst working on the
backing storage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
A significant proportion of the cmdparsing time for some batches is the
cost to find the register in the mmiotable. We ensure that those tables
are in ascending order such that we could do a binary search if it was
ever merited. It is.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-38-chris@chris-wilson.co.uk
On the blitter (and in test code), we see long sequences of repeated
commands, e.g. XY_PIXEL_BLT, XY_SCANLINE_BLT, or XY_SRC_COPY. For these,
we can skip the hashtable lookup by remembering the previous command
descriptor and doing a straightforward compare of the command header.
The corollary is that we need to do one extra comparison before lookup
up new commands.
v2: Less magic mask (ok, it is still magic, but now you cannot see!)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-36-chris@chris-wilson.co.uk
The existing code's hashfunction is very suboptimal (most 3D commands
use the same bucket degrading the hash to a long list). The code even
acknowledge that the issue was known and the fix simple:
/*
* If we attempt to generate a perfect hash, we should be able to look at bits
* 31:29 of a command from a batch buffer and use the full mask for that
* client. The existing INSTR_CLIENT_MASK/SHIFT defines can be used for this.
*/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-35-chris@chris-wilson.co.uk
For simplicity, we want to continue using a contiguous mapping of the
command buffer, but we can reduce the number of vmappings we hold by
switching over to a page-by-page copy from the user batch buffer to the
shadow. The cost for saving one linear mapping is about 5% in trivial
workloads - which is more or less the overhead in calling kmap_atomic().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-34-chris@chris-wilson.co.uk
The single largest factor in the overhead of parsing the commands is the
setup of the virtual mapping to provide a continuous block for the batch
buffer. If we keep those vmappings around (against the better judgement
of mm/vmalloc.c, which we offset by handwaving and looking suggestively
at the shrinker) we can dramatically improve the performance of the
parser for small batches (such as media workloads). Furthermore, we can
use the prepare shmem read/write functions to determine how best we
need to clflush the range (rather than every page of the object).
The impact of caching both src/dst vmaps is +80% on ivb and +140% on byt
for the throughput on small batches. (Caching just the dst vmap and
iterating over the src, doing a page by page copy is roughly 5% slower
on both platforms. That may be an acceptable trade-off to eliminate one
cached vmapping, and we may be able to reduce the per-page copying overhead
further.) For *this* simple test case, the cmdparser is now within a
factor of 2 of ideal performance.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-33-chris@chris-wilson.co.uk
Since I have been using the BCS_TIMESTAMP to measure latency of
execution upon the blitter ring, allow regular userspace to also read
from that register. They are already allowed RCS_TIMESTAMP!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-32-chris@chris-wilson.co.uk
If the developer adds a register in the wrong order, we BUG during boot.
That makes development and testing very difficult. Let's be a bit more
friendly and disable the command parser with a big warning if the tables
are invalid.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-31-chris@chris-wilson.co.uk
This is a companion to i915_gem_obj_prepare_shmem_read() that prepares
the backing storage for direct writes. It first serialises with the GPU,
pins the backing storage and then indicates what clfushes are required in
order for the writes to be coherent.
Whilst here, fix support for ancient CPUs without clflush for which we
cannot do the GTT+clflush tricks.
v2: Add i915_gem_obj_finish_shmem_access() for symmetry
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-8-chris@chris-wilson.co.uk
text data bss dec hex filename
6309351 3578714 696320 10584385 a18141 vmlinux
6308391 3578714 696320 10583425 a17d81 vmlinux
Almost 1KiB of code reduction.
v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions
text data bss dec hex filename
6304579 3578778 696320 10579677 a16edd vmlinux
6303427 3578778 696320 10578525 a16a5d vmlinux
Now over 1KiB!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk
Allowing register copies where the source and destination are both
whitelisted should be safe, and is useful. For example, Mesa uses
this to load the command streamer math registers with data from the
pipeline statistics counters.
v2: Reject writes to OACONTROL (and reads as well :(
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1462521014-13595-1-git-send-email-chris@chris-wilson.co.uk
If the command parser is not active, then it is appropriate to report it
as operating at version 0 as no higher mode is supported. This greatly
simplifies userspace querying for the command parser as we then do not
need to second guess when it will be active (a mixture of module
parameters and generational support, which may change over time).
v2: s/comand/command/ misspelling in comment
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1462368336-21230-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Now that we can whitelist registers only on Haswell, move HSW_SCRATCH1
and HSW_ROW_CHICKEN3 into a separate Haswell only table.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1457335830-30923-4-git-send-email-jordan.l.justen@intel.com
For Haswell, we will want another table of registers while retaining
the large common table of whitelisted registers shared by all gen7
devices.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
[danvet: Pipe patch through sed -e 's/\<ring\>/engine/g' to make it
apply.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
This is required to support glDispatchComputeIndirect for gen7.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes regression from
commit f1afe24f0e
Author: Arun Siluvery <arun.siluvery@linux.intel.com>
Date: Tue Aug 4 16:22:20 2015 +0100
drm/i915: Change SRM, LRM instructions to use correct length
which forgot to account for the length bias when declaring the fixed
length.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91844
Reported-by: Andreas Reis <andreas.reis@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This was forgotten in
commit d351f6d948
Author: Francisco Jerez <currojerez@riseup.net>
Date: Fri May 29 16:44:15 2015 +0300
drm/i915: Add SCRATCH1 and ROW_CHICKEN3 to the register whitelist.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
MI_STORE_REGISTER_MEM, MI_LOAD_REGISTER_MEM instructions are not really
variable length instructions unlike MI_LOAD_REGISTER_IMM where it expects
(reg, addr) pairs so use fixed length for these instructions.
v2: rebase
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Appease checkpatch as Mika spotted in i915_reg.h - it seems
terminally unhappy about i915_cmd_parser.c so that would be a separate
patch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we may like to use a bisection search on the tables in future, we
need them to be ordered. For convenience we expect the compiled tables
to be order and check on initialisation. However, the validator used the
wrong iterators failed to spot the misordered MI tables and instead
walked off into the unknown (as spotted by kasan).
Signed-off-by: Hanno Boeck <hanno@hboeck.de>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Again hand-assemble patch ...]
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
In the future, we may want to speed up command/register searching using
a bisection and so we require them to be in ascending order respectively
by command value or register address. However, this was not true for one
pair in the MI table; make it so.
Signed-off-by: Hanno Boeck <hanno@hboeck.de>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Hand-assemble patch from raw patch from Hanno and commit message from Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
In this WA we need to set GEN8_L3SQCREG4[21:21] and reset it after PIPE_CONTROL
instruction but there is a slight complication as this is applied in WA batch
where the values are only initialized once.
Dave identified an issue with the current implementation where the register value
is read once at the beginning and it is reused; this patch corrects this by saving
the register value to memory, update register with the bit of our interest and
restore it back with original value.
This implementation uses MI_LOAD_REGISTER_MEM which is currently only used
by command parser and was using a default length of 0. This is now updated
with correct length and moved to appropriate place.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Only bit 27 of SCRATCH1 and bit 6 of ROW_CHICKEN3 are allowed to be
set because of security-sensitive bits we don't want userspace to mess
with. On HSW hardware the whitelisted bits control whether atomic
read-modify-write operations are performed on L3 or on GTI, and when
set to L3 (which can be 10x-30x better performing than on GTI,
depending on the application) require great care to avoid a system
hang, so we currently program them to be handled on GTI by default.
Beignet can immediately start taking advantage of this change to
enable L3 atomics. Mesa should eventually switch to L3 atomics too,
but a number of non-trivial changes are still required so it will
continue using GTI atomics for now.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In some cases it might be unnecessary or dangerous to give userspace
the right to write arbitrary values to some register, even though it
might be desirable to give it control of some of its bits. This patch
extends the register whitelist entries to contain a mask/value pair in
addition to the register offset. For registers with non-zero mask,
any LRM writes and LRI writes where the bits of the immediate given by
the mask don't match the specified value will be rejected.
This will be used in my next patch to grant userspace partial write
access to some sensitive registers.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Until now the software command checker assumed that commands could
read or write at most a single register per packet. This is not
necessarily the case, MI_LOAD_REGISTER_IMM expects a variable-length
list of offset/value pairs and writes them in sequence. The previous
code would only check whether the first entry was valid, effectively
allowing userspace to write unrestricted registers of the MMIO space
by sending a multi-register write with a legal first register, with
potential security implications on Gen6 and 7 hardware.
Fix it by extending the drm_i915_cmd_descriptor table to represent
multi-register access and making validate_cmd() iterate for all
register offsets present in the command packet.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move the madvise logic out of the execbuffer main path into the
relatively rare allocation path, making the execbuffer manipulation less
fragile.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
vmap_batch() calculates amount of needed pages for the mapping
we are going to create. And it uses this page count as an
argument for the for_each_sg_pages() macro. The macro takes the number
of sg list entities as an argument, not the page count. So we ended
up iterating through all the pages on the mapped object, corrupting
memory past the smaller pages[] array.
Fix this by bailing out when we have enough pages.
This regression has been introduced in
commit 17cabf571e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Jan 14 11:20:57 2015 +0000
drm/i915: Trim the command parser allocations
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently, the command parser tries to create a secondary batch exactly
as large as the original, and vmap both. This is open to abuse by
userspace using extremely large batch objects, but only executing very
short batches. For example, this would be if userspace were to implement
a command submission ringbuffer. However, we only need to allocate pages
for just the contents of the command sequence in the batch - all
relocations copied to the secondary batch will reference the original
batch and so there can be no access to the secondary batch outside of
the explicit execution region.
Testcase: igt/gem_exec_big #ivb,byt,hsw
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88308
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This will allow us to read the number of dispatched compute threads
for GL_ARB_pipeline_statistics_query.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move it to a separate function since the main do_execbuffer function
already has so much going on.
v2:
- Move pin/unpin calls inside i915_parse_cmds() (Chris W, v4 7/7
feedback)
Issue: VIZ-4719
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-By: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Previously we couldn't trust the user-supplied batch length because
it came directly from userspace (i.e. untrusted code). It would have
affected what commands software parsed without regard to what hardware
would actually execute, leaving a potential hole.
With the parser now copying the user supplied batch buffer and writing
MI_NOP commands to any space after the copied region, we can safely use
the batch length input. This should be a performance win as the actual
batch length is frequently much smaller than the allocated object size.
v2: Fix handling of non-zero batch_start_offset
Issue: VIZ-4719
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-By: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This patch sets up all of the tracking and copying necessary to
use batch pools with the command parser and dispatches the copied
(shadow) batch to the hardware.
After this patch, the parser is in 'enabling' mode.
Note that performance takes a hit from the copy in some cases
and will likely need some work. At a rough pass, the memcpy
appears to be the bottleneck. Without having done a deeper
analysis, two ideas that come to mind are:
1) Copy sections of the batch at a time, as they are reached
by parsing. Might improve cache locality.
2) Copy only up to the userspace-supplied batch length and
memset the rest of the buffer. Reduces the number of reads.
v2:
- Remove setting the capacity of the pool
- One global pool instead of per-ring pools
- Replace batch_obj with shadow_batch_obj and hook into eb->vmas
- Memset any space in the shadow batch beyond what gets copied
- Rebased on execlist prep refactoring
v3:
- Rebase on chained batch handling
- Squash in setting the secure dispatch flag
- Add a note about the interaction w/secure dispatch pinning
- Check for request->batch_obj == NULL in i915_gem_free_request
v4:
- Fix read domains for shadow_batch_obj
- Remove the set_to_gtt_domain call from i915_parse_cmds
- ggtt_pin/unpin in the parser block to simplify error handling
- Check USES_FULL_PPGTT before setting DISPATCH_SECURE flag
- Remove i915_gem_batch_pool_put calls
v5:
- Move 'pending_read_domains |= I915_GEM_DOMAIN_COMMAND' after
the parser (danvet, from v4 0/7 feedback)
Issue: VIZ-4719
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-By: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Was missing.
Issue: VIZ-4701
Signed-off-by: Michael H. Nguyen <michael.h.nguyen@intel.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that sanity prevails and we have the clean split between software
init and starting the engines we can drop all the "have we allocate
this struct already?" nonsense.
Execlist code could benefit quite a bit more still, but that's for
another patch.
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
The predicate source registers are needed to implement conditional
rendering without stalling. The two source registers are used to load
the previous values of the PS_DEPTH_COUNT register saved from
PIPE_CONTROL commands. These can then be compared and used to set the
predicate enable bit via the MI_PREDICATE command.
The command parser version number is increased to 2 to make it easier
to detect the new functionality in user space.
Signed-off-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Backmerge drm-next so that I can keep merging patches. Specifically I
want:
- atomic stuff, yay!
- eld parsing patch from Jani.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Just various stuff all over from a bunch of people. Shortlog gives a beter
overview, it's really all misc drm patches.
* tag 'topic/core-stuff-2014-11-05' of git://anongit.freedesktop.org/drm-intel:
drm/edid: add #defines and helpers for ELD
drm/dp: Add counters in the drm_dp_aux struct for I2C NACKs and DEFERs
drm: Remove compiler BUG_ON() test
drm: Fix DRM_FORCE_ON_DIGITAL use
drm/gma500: Don't destroy DRM properties in the driver
drm/i915: Don't destroy DRM properties in the driver
drm: Add a note to drm_property_create() about property lifetime
gpu: drm: Fix warning caused by a parameter description in drm_crtc.c
drm/dp-helper: Move the legacy helpers to gma500
drm/crtc: Remove duplicated ioctl code
drm/crtc: Fix two typos
gpu:drm: Fix typo in Documentation/DocBook/drm.xml
gpu: drm: drm_dp_mst_topology.c: Fix improper use of strncat
drm: drm_err: Remove unnecessary __func__ argument
drm: Implement O_NONBLOCK support on /dev/dri/cardN
libva uses chained batch buffers in a way that the command parser
can't generally handle. Fortunately, libva doesn't need to write
registers from batch buffers in the way that mesa does, so this
patch causes the driver to fall back to non-secure dispatch if
the parser detects a chained batch buffer.
Note: The 2nd hunk to munge the error code of the parser looks a bit
superflous. At least until we have the batch copy code ready and can
run the cmd parser in granting mode. But it isn't since we still need
to let existing libva buffers pass (though not with elevated privs
ofc!).
Testcase: igt/gem_exec_parse/chained-batch
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Add note - this confused me in review and Brad clarified
things (after a few mails ...).]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ok, new attempt, this time around with full ppgtt disabled again.
drm-intel-next-2014-10-03:
- first batch of skl stage 1 enabling
- fixes from Rodrigo to the PSR, fbc and sink crc code
- kerneldoc for the frontbuffer tracking code, runtime pm code and the basic
interrupt enable/disable functions
- smaller stuff all over
drm-intel-next-2014-09-19:
- bunch more i830M fixes from Ville
- full ppgtt now again enabled by default
- more ppgtt fixes from Michel Thierry and Chris Wilson
- plane config work from Gustavo Padovan
- spinlock clarifications
- piles of smaller improvements all over, as usual
* tag 'drm-intel-next-2014-10-03-no-ppgtt' of git://anongit.freedesktop.org/drm-intel: (114 commits)
Revert "drm/i915: Enable full PPGTT on gen7"
drm/i915: Update DRIVER_DATE to 20141003
drm/i915: Remove the duplicated logic between the two shrink phases
drm/i915: kerneldoc for interrupt enable/disable functions
drm/i915: Use dev_priv instead of dev in irq setup functions
drm/i915: s/pm._irqs_disabled/pm.irqs_enabled/
drm/i915: Clear TX FIFO reset master override bits on chv
drm/i915: Make sure hardware uses the correct swing margin/deemph bits on chv
drm/i915: make sink_crc return -EIO on aux read/write failure
drm/i915: Constify send buffer for intel_dp_aux_ch
drm/i915: De-magic the PSR AUX message
drm/i915: Reinstate error level message for non-simulated gpu hangs
drm/i915: Kerneldoc for intel_runtime_pm.c
drm/i915: Call runtime_pm_disable directly
drm/i915: Move intel_display_set_init_power to intel_runtime_pm.c
drm/i915: Bikeshed rpm functions name a bit.
drm/i915: Extract intel_runtime_pm.c
drm/i915: Remove intel_modeset_suspend_hw
drm/i915: spelling fixes for frontbuffer tracking kerneldoc
drm/i915: Tighting frontbuffer tracking around flips
...
This patch fix spelling typos found in drm.xml.
It is because the file is generated from comments in
source codes, I have to fix the typos within source files.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull drm updates from Dave Airlie:
"This is the main git pull for the drm,
I pretty much froze major pulls at -rc5/6 time, and haven't had much
fallout, so will probably continue doing that.
Lots of changes all over, big internal header cleanup to make it clear
drm features are legacy things and what are things that modern KMS
drivers should be using. Also big move to use the new generic fences
in all the TTM drivers.
core:
atomic prep work,
vblank rework changes, allows immediate vblank disables
major header reworking and cleanups to better delinate legacy
interfaces from what KMS drivers should be using.
cursor planes locking fixes
ttm:
move to generic fences (affects all TTM drivers)
ppc64 caching fixes
radeon:
userptr support,
uvd for old asics,
reset rework for fence changes
better buffer placement changes,
dpm feature enablement
hdmi audio support fixes
intel:
Cherryview work,
180 degree rotation,
skylake prep work,
execlist command submission
full ppgtt prep work
cursor improvements
edid caching,
vdd handling improvements
nouveau:
fence reworking
kepler memory clock work
gt21x clock work
fan control improvements
hdmi infoframe fixes
DP audio
ast:
ppc64 fixes
caching fix
rcar:
rcar-du DT support
ipuv3:
prep work for capture support
msm:
LVDS support for mdp4, new panel, gpu refactoring
exynos:
exynos3250 SoC support, drop bad mmap interface,
mipi dsi changes, and component match support"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (640 commits)
drm/mst: rework payload table allocation to conform better.
drm/ast: Fix HW cursor image
drm/radeon/kv: add uvd/vce info to dpm debugfs output
drm/radeon/ci: add uvd/vce info to dpm debugfs output
drm/radeon: export reservation_object from dmabuf to ttm
drm/radeon: cope with foreign fences inside the reservation object
drm/radeon: cope with foreign fences inside display
drm/core: use helper to check driver features
drm/radeon/cik: write gfx ucode version to ucode addr reg
drm/radeon/si: print full CS when we hit a packet 0
drm/radeon: remove unecessary includes
drm/radeon/combios: declare legacy_connector_convert as static
drm/radeon/atombios: declare connector convert tables as static
drm/radeon: drop btc_get_max_clock_from_voltage_dependency_table
drm/radeon/dpm: drop clk/voltage dependency filters for BTC
drm/radeon/dpm: drop clk/voltage dependency filters for CI
drm/radeon/dpm: drop clk/voltage dependency filters for SI
drm/radeon/dpm: drop clk/voltage dependency filters for NI
drm/radeon: disable audio when we disable hdmi (v2)
drm/radeon: split audio enable between eg and r600 (v2)
...
Ring init and cleanup are not balanced because we re-init the rings on
resume without having cleaned them up on suspend. This leads to the
driver leaking the parser's hash tables with a kmemleak signature such
as this:
unreferenced object 0xffff880405960980 (size 32):
comm "systemd-udevd", pid 516, jiffies 4294896961 (age 10202.044s)
hex dump (first 32 bytes):
d0 85 46 c0 ff ff ff ff 00 00 00 00 00 00 00 00 ..F.............
98 60 28 04 04 88 ff ff 00 00 00 00 00 00 00 00 .`(.............
backtrace:
[<ffffffff81816f9e>] kmemleak_alloc+0x4e/0xb0
[<ffffffff811fa678>] kmem_cache_alloc_trace+0x168/0x2f0
[<ffffffffc03e20a5>] i915_cmd_parser_init_ring+0x2a5/0x3e0 [i915]
[<ffffffffc04088a2>] intel_init_ring_buffer+0x202/0x470 [i915]
[<ffffffffc040c998>] intel_init_vebox_ring_buffer+0x1e8/0x2b0 [i915]
[<ffffffffc03eff59>] i915_gem_init_hw+0x2f9/0x3a0 [i915]
[<ffffffffc03f0057>] i915_gem_init+0x57/0x1d0 [i915]
[<ffffffffc045e26a>] i915_driver_load+0xc0a/0x10e0 [i915]
[<ffffffffc02e0d5d>] drm_dev_register+0xad/0x100 [drm]
[<ffffffffc02e3b9f>] drm_get_pci_dev+0x8f/0x200 [drm]
[<ffffffffc03c934b>] i915_pci_probe+0x3b/0x60 [i915]
[<ffffffff81436725>] local_pci_probe+0x45/0xa0
[<ffffffff81437a69>] pci_device_probe+0xd9/0x130
[<ffffffff81524f4d>] driver_probe_device+0x12d/0x3e0
[<ffffffff815252d3>] __driver_attach+0x93/0xa0
[<ffffffff81522e1b>] bus_for_each_dev+0x6b/0xb0
This patch extends the current convention of checking whether a
resource is already allocated before allocating it during ring init.
Longer term it might make sense to only init the rings once.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83794
Tested-by: Kari Suvanto <kari.tj.suvanto@gmail.com>
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The other paths in the command parser that reject a batch all
log a message indicating the reason. We simply missed this one.
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In commit
commit 896ab1a5d5
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Wed Aug 6 15:04:51 2014 +0200
drm/i915: Fix up checks for aliasing ppgtt
it looks like we accidentally inverted the check that the command
parser should only run when the driver enables some form of PPGTT.
Testcase: igt/gem_exec_parse
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Also drop the comment right above, all production vlv now
have hw ppgtt enabled.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A subsequent patch will no longer initialize the aliasing ppgtt if we
have full ppgtt enabled, since we simply don't need that any more.
Unfortunately a few places check for the aliasing ppgtt instead of
checking for ppgtt in general. Fix them up.
One special case are the gtt offset and size macros, which have some
code to remap the aliasing ppgtt to the global gtt. The aliasing ppgtt
is _not_ a logical address space, so passing that in as the vm is
plain and simple a bug. So just WARN about it and carry on - we have a
gracefully fall-through anyway if we can't find the vma.
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Beignet needs these in order to program the L3 cache config for
OpenCL workloads, particularly when using SLM.
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In the upcoming patches we plan to break the correlation between
engine command streamers (a.k.a. rings) and ringbuffers, so it
makes sense to refactor the code and make the change obvious.
No functional changes.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For clients that submit large batch buffers the command parser has
a substantial impact on performance. On my HSW ULT system performance
drops as much as ~20% on some tests. Most of the time is spent in the
command lookup code. Converting that from the current naive search to
a hash table lookup reduces the performance drop to ~10%.
The choice of value for I915_CMD_HASH_ORDER allows all commands
currently used in the parser tables to hash to their own bucket (except
for one collision on the render ring). The tradeoff is that it wastes
memory. Because the opcodes for the commands in the tables are not
particularly well distributed, reducing the order still leaves many
buckets empty. The increased collisions don't seem to have a huge
impact on the performance gain, but for now anyhow, the parser trades
memory for performance.
NB: Ville noticed that the error paths through the ring init code
will leak memory. I've not addressed that here. We can do a follow
up pass to handle all of the leaks.
v2: improved comment describing selection of hash key mask (Damien)
replace a BUG_ON() with an error return (Tvrtko, Ville)
commit message improvements
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This reverts commit 60f2b4af12.
The same warning has been fixed in e5081a538a and
these two commits got merged in 74e99a84de2d0980320612db8015ba606af42114 which
caused another warning. Simply, the reverted commit casted the pointer
difference to unsigned long and the other commit changed the output type from
long to ptrdiff_t.
The other commit fixes the original warning the better way so I'm reverting
this commit now.
Signed-off-by: Jan Moskyto Matejka <mq@suse.cz>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville noticed that we have this nice kerneldoc but it's not integrated
anywhere. Fix this asap!
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
These are additional registers needed for performance monitoring and
ARB_draw_indirect extensions in mesa.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76719
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
[danvet: Squash in fixup from Brad requested by Ken.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Backmerge drm-next after the big s/crtc->fb/crtc->primary->fb/
cocinelle patch to avoid endless amounts of conflict hilarity in my
-next queue for 3.16.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
- Inherit/reuse firmwar framebuffers (for real this time) from Jesse, less
flicker for fastbooting.
- More flexible cloning for hdmi (Ville).
- Some PPGTT fixes from Ben.
- Ring init fixes from Naresh Kumar.
- set_cache_level regression fixes for the vma conversion from Ville&Chris.
- Conversion to the new dp aux helpers (Jani).
- Unification of runtime pm with pc8 support from Paulo, prep work for runtime
pm on other platforms than HSW.
- Larger cursor sizes (Sagar Kamble).
- Piles of improvements and fixes all over, as usual.
* tag 'drm-intel-next-2014-03-21' of git://anongit.freedesktop.org/drm-intel: (75 commits)
drm/i915: Include a note about the dangers of I915_READ64/I915_WRITE64
drm/i915/sdvo: fix questionable return value check
drm/i915: Fix unsafe loop iteration over vma whilst unbinding them
drm/i915: Enabling 128x128 and 256x256 ARGB Cursor Support
drm/i915: Print how many objects are shared in per-process stats
drm/i915: Per-process stats work better when evaluated per-process
drm/i915: remove rps local variables
drm/i915: Remove extraneous MMIO for RPS
drm/i915: Rename and comment all the RPS *stuff*
drm/i915: Store the HW min frequency as min_freq
drm/i915: Fix coding style for RPS
drm/i915: Reorganize the overclock code
drm/i915: init pm.suspended earlier
drm/i915: update the PC8 and runtime PM documentation
drm/i915: rename __hsw_do_{en, dis}able_pc8
drm/i915: kill struct i915_package_c8
drm/i915: move pc8.irqs_disabled to pm.irqs_disabled
drm/i915: remove dev_priv->pc8.enabled
drm/i915: don't get/put PC8 when getting/putting power wells
drm/i915: make intel_aux_display_runtime_get get runtime PM, not PC8
...
Conflicts:
drivers/gpu/drm/i915/intel_display.c
drivers/gpu/drm/i915/intel_dp.c
Drop the cast from the pointer diff to fix:
drivers/gpu/drm/i915/i915_cmd_parser.c:405:4: warning: format '%td' expects
argument of type 'ptrdiff_t', but argument 5 has type 'long unsigned int'
[-Wformat]
While at it, use %u for u32.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
[danvet: After conflict resolution only the "While at it, ..." part
was left standing ...]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is some thought that the data from the performance counters enabled
via OACONTROL should only be available to the process that enabled counting.
To limit snooping, require that any batch buffer which sets OACONTROL to a
non-zero value also sets it back to 0 before the end of the batch.
This requires limiting OACONTROL writes to happen via MI_LOAD_REGISTER_IMM
so that we can access the value being written. This should be in line with
the expected use case for writing OACONTROL.
v2: Drop an unnecessary '? true : false'
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This brings the code a little more in line with kernel coding style.
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As suggested during review, this makes it much more obvious
when the tables are not sorted.
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There are no longer users of drm_i915_private_t. Drop the typedef. Good
riddance.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Chris Wilson <chris@chris-wislon.co.uk>
[danvet: Add the hunk in i915_cmd_parser.c here which had to be
relocated to the how this was merged.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Mesa needs to be able to write OACONTROL in order to expose the
Observability Architecture's performance counters via OpenGL.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
[danvet: Add comment that this is just a temporary work-around and
that we need to check more things before we can allow OACONTROL writes
for real everywhere.]
[danvet 2: Squash in fixup to avoid a DRM_ERROR due to unsorted reg
list, spotted by Jani.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
So userspace can query the kernel for command parser support.
v2: Add i915_cmd_parser_get_version(), history log, and kerneldoc
OTC-Tracker: AXIA-4631
Change-Id: I58af650db9f6753c2dcac9c54ab432fd31db302f
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
PIPE_CONTROL and MI_FLUSH_DW have bits that would write to the
hardware status page. The driver stores request tracking info
there, so don't let userspace overwrite it.
v2: trailing comma fix, rebased
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Various commands that access memory have a bit to determine whether
the graphics address specified in the command should use the GGTT or
PPGTT for translation. These checks ensure that the bit indicates
PPGTT translation.
Most of these checks use the existing bit-checking infrastructure.
The PIPE_CONTROL and MI_FLUSH_DW commands, however, are multi-function
commands. The GGTT/PPGTT bit is only relevant for certain uses of the
command. As such, this change also extends the bit-checking code to
include a "condition" mask and offset. If the condition mask is non-zero
then the parser only performs the bit check when the bits specified by
the condition mask/offset are also non-zero.
NOTE: At this point in the series PPGTT must be enabled for the parser
to work correctly. If it's not enabled, userspace will not be setting
the PPGTT bits the way the parser requires. VLV is the only platform
where this is a problem, so at this point, we disable parsing for VLV.
v2: whitespace and trailing commas fixes, rebased
OTC-Tracker: AXIA-4631
Change-Id: I3f4c76b6734f1956ec47e698230f97d0998ff92b
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Drop the unecessary cast Jani spotted.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The driver leaves most interrupts masked during normal operation,
so there would have to be additional work to enable userspace to
safely request/receive an interrupt.
v2: trailing commas, rebased
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
MI_STORE_REGISTER_MEM, MI_LOAD_REGISTER_MEM, and MI_LOAD_REGISTER_IMM
commands allow userspace access to registers. Only certain registers
should be allowed for such access, so enable checking for those commands.
Each ring gets its own register whitelist.
MI_LOAD_REGISTER_REG on HSW also allows register access but is currently
unused by userspace components. Leave it rejected.
PIPE_CONTROL and MEDIA_VFE_STATE allow register access based on certain
bits being set. Reject those as well.
v2: trailing commas, rebased
OTC-Tracker: AXIA-4631
Change-Id: Ie614a2f0eb2e5917de809e5a17957175d24cc44f
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
These are used to implement scanline waits in the X server.
v2: Use #defines instead of magic numbers
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
These registers are currently used by mesa for blitting,
transform feedback extensions, and performance monitoring
extensions.
v2: REG64 macro
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The Intel DDX uses these to implement scanline waits in the X server.
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The spec defines most of these commands as privileged. A few others,
like the semaphore mbox command and some display commands, are also
reserved for the driver's use. Subsequent patches relax some of
these restrictions.
v2: Rebased
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add command tables defining irregular length commands for each ring.
This requires a few new command opcode definitions.
v2: Whitespace adjustment in command definitions, sparse fix for !F
OTC-Tracker: AXIA-4631
Change-Id: I064bceb457e15f46928058352afe76d918c58ef5
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
/ssd/git/drm-next/drivers/gpu/drm/i915/i915_cmd_parser.c: In function ‘i915_parse_cmds’:
/ssd/git/drm-next/drivers/gpu/drm/i915/i915_cmd_parser.c:405:4: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 5 has type ‘int’ [-Wformat=]
DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%d batchlen=%ld\n",
^
Signed-off-by: Dave Airlie <airlied@redhat.com>
When compiling on 32bits, I have the following warning:
drivers/gpu/drm/i915/i915_cmd_parser.c:405:4: warning: format ‘%ld’
expects argument of type ‘long int’, but argument 7 has type ‘int’
[-Wformat=]
DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X
length=%d batchlen=%ld\n",
The ptrdiff_t type has its own modifier: 't'.
Cc: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The command parser scans batch buffers submitted via execbuffer ioctls before
the driver submits them to hardware. At a high level, it looks for several
things:
1) Commands which are explicitly defined as privileged or which should only be
used by the kernel driver. The parser generally rejects such commands, with
the provision that it may allow some from the drm master process.
2) Commands which access registers. To support correct/enhanced userspace
functionality, particularly certain OpenGL extensions, the parser provides a
whitelist of registers which userspace may safely access (for both normal and
drm master processes).
3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
parser always rejects such commands.
See the overview comment in the source for more details.
This patch only implements the logic. Subsequent patches will build the tables
that drive the parser.
v2: Don't set the secure bit if the parser succeeds
Fail harder during init
Makefile cleanup
Kerneldoc cleanup
Clarify module param description
Convert ints to bools in a few places
Move client/subclient defs to i915_reg.h
Remove the bits_count field
OTC-Tracker: AXIA-4631
Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Appease checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>