OpenCloudOS-Kernel/arch/arm
Rob Herring d25c881aa3 ARM: 7493/1: use generic unaligned.h
This moves ARM over to the asm-generic/unaligned.h header. This has the
benefit of better code generated especially for ARMv7 on gcc 4.7+
compilers.

As Arnd Bergmann, points out: The asm-generic version uses the "struct"
version for native-endian unaligned access and the "byteshift" version
for the opposite endianess. The current ARM version however uses the
"byteshift" implementation for both.

Thanks to Nicolas Pitre for the excellent analysis:

Test case:

int foo (int *x) { return get_unaligned(x); }
long long bar (long long *x) { return get_unaligned(x); }

With the current ARM version:

foo:
	ldrb	r3, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	r3, r3, asl #16	@ tmp154, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	r3, r3, r1, asl #8	@, tmp155, tmp154, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r3, r2	@ tmp157, tmp155, MEM[(const u8 *)x_1(D)]
	orr	r0, r3, r0, asl #24	@,, tmp157, MEM[(const u8 *)x_1(D) + 3B],
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	mov	r2, #0	@ tmp184,
	ldrb	r5, [r0, #6]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 6B], MEM[(const u8 *)x_1(D) + 6B]
	ldrb	r4, [r0, #5]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 5B], MEM[(const u8 *)x_1(D) + 5B]
	ldrb	ip, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #4]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 4B], MEM[(const u8 *)x_1(D) + 4B]
	mov	r5, r5, asl #16	@ tmp175, MEM[(const u8 *)x_1(D) + 6B],
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	orr	r5, r5, r4, asl #8	@, tmp176, tmp175, MEM[(const u8 *)x_1(D) + 5B],
	ldrb	r6, [r0, #7]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 7B], MEM[(const u8 *)x_1(D) + 7B]
	orr	r5, r5, r1	@ tmp178, tmp176, MEM[(const u8 *)x_1(D) + 4B]
	ldrb	r4, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	ip, ip, asl #16	@ tmp188, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r1, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	ip, ip, r7, asl #8	@, tmp189, tmp188, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r5, r6, asl #24	@,, tmp178, MEM[(const u8 *)x_1(D) + 7B],
	orr	ip, ip, r4	@ tmp191, tmp189, MEM[(const u8 *)x_1(D)]
	orr	ip, ip, r1, asl #24	@, tmp194, tmp191, MEM[(const u8 *)x_1(D) + 3B],
	mov	r1, r3	@,
	orr	r0, r2, ip	@ tmp171, tmp184, tmp194
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

In both cases the code is slightly suboptimal.  One may wonder why
wasting r2 with the constant 0 in the second case for example.  And all
the mov's could be folded in subsequent orr's, etc.

Now with the asm-generic version:

foo:
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	bx	lr	@

bar:
	mov	r3, r0	@ x, x
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	ldr	r1, [r3, #4]	@ unaligned	@,
	bx	lr	@

This is way better of course, but only because this was compiled for
ARMv7. In this case the compiler knows that the hardware can do
unaligned word access.  This isn't that obvious for foo(), but if we
remove the get_unaligned() from bar as follows:

long long bar (long long *x) {return *x; }

then the resulting code is:

bar:
	ldmia	r0, {r0, r1}	@ x,,
	bx	lr	@

So this proves that the presumed aligned vs unaligned cases does have
influence on the instructions the compiler may use and that the above
unaligned code results are not just an accident.

Still... this isn't fully conclusive without at least looking at the
resulting assembly fron a pre ARMv6 compilation.  Let's see with an
ARMv5 target:

foo:
	ldrb	r3, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r2, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r3, r3, r1, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r2, asl #16	@, tmp145, tmp142, tmp143,
	orr	r0, r3, r0, asl #24	@,, tmp145, tmp146,
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r3, [r0, #4]	@ zero_extendqisi2	@ tmp149,
	ldrb	r6, [r0, #5]	@ zero_extendqisi2	@ tmp150,
	ldrb	r5, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r4, [r0, #6]	@ zero_extendqisi2	@ tmp153,
	ldrb	r1, [r0, #7]	@ zero_extendqisi2	@ tmp156,
	ldrb	ip, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r2, r2, r7, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r6, asl #8	@, tmp152, tmp149, tmp150,
	orr	r2, r2, r5, asl #16	@, tmp145, tmp142, tmp143,
	orr	r3, r3, r4, asl #16	@, tmp155, tmp152, tmp153,
	orr	r0, r2, ip, asl #24	@,, tmp145, tmp146,
	orr	r1, r3, r1, asl #24	@,, tmp155, tmp156,
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

Compared to the initial results, this is really nicely optimized and I
couldn't do much better if I were to hand code it myself.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-08-25 09:22:30 +01:00
..
boot ARM: 7492/1: add strstr declaration for decompressors 2012-08-25 09:22:30 +01:00
common ARM: dma-mapping: add support for dma_get_sgtable() 2012-07-30 12:25:47 +02:00
configs Merge branch 'dmaengine' of git://git.linaro.org/people/rmk/linux-arm 2012-08-01 16:41:07 -07:00
include/asm ARM: 7493/1: use generic unaligned.h 2012-08-25 09:22:30 +01:00
kernel Merge branch 'audit' of git://git.linaro.org/people/rmk/linux-arm 2012-08-01 16:35:37 -07:00
lib arch: remove direct definitions of KERN_<LEVEL> uses 2012-07-30 17:25:13 -07:00
mach-at91 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2012-07-26 13:00:59 -07:00
mach-bcmring
mach-clps711x ARM: clps711x: Remove the setting of the time 2012-07-17 22:24:30 +02:00
mach-cns3xxx ARM: PCI: provide a default bus scan implementation 2012-05-13 17:12:17 +01:00
mach-davinci fbdev updates for 3.6 2012-08-01 10:45:12 -07:00
mach-dove ARM: Orion: DT support for IRQ and GPIO Controllers 2012-07-27 16:48:14 +02:00
mach-ebsa110
mach-ep93xx arm-soc: soc-specific updates 2012-07-23 16:08:40 -07:00
mach-exynos ARM: 7485/1: EXYNOS: use SGI0 to wake secondary CPUs 2012-08-11 09:26:30 +01:00
mach-footbridge ARM: PCI: provide a default bus scan implementation 2012-05-13 17:12:17 +01:00
mach-gemini
mach-h720x
mach-highbank clk: add highbank clock support 2012-07-11 17:58:47 -07:00
mach-imx GPIO changes for v3.6: 2012-07-26 13:56:38 -07:00
mach-integrator ARM: integrator: convert to common clock 2012-07-11 17:58:45 -07:00
mach-iop13xx ARM: PCI: get rid of pci_std_swizzle() 2012-05-13 17:12:16 +01:00
mach-iop32x ARM: PCI: provide a default bus scan implementation 2012-05-13 17:12:17 +01:00
mach-iop33x ARM: PCI: provide a default bus scan implementation 2012-05-13 17:12:17 +01:00
mach-ixp4xx - More robust parsing especially of xattr data in JFFS2 2012-06-01 16:55:42 -07:00
mach-kirkwood ARM: Kirkwood: Replace mrvl with marvell 2012-07-27 16:50:57 +02:00
mach-ks8695 ARM: PCI: provide a default bus scan implementation 2012-05-13 17:12:17 +01:00
mach-l7200/include/mach
mach-lpc32xx ARM: LPC32xx: Add PWM clock 2012-07-20 14:01:51 +02:00
mach-mmp ARM: mmp: add missing irqs.h 2012-08-02 10:15:59 -07:00
mach-msm ARM: MSM: use SGI0 to wake secondary CPUs 2012-07-09 17:39:36 +01:00
mach-mv78xx0 ARM: Orion: DT support for IRQ and GPIO Controllers 2012-07-27 16:48:14 +02:00
mach-mvebu arm: mvebu: generate DTBs for supported SoCs 2012-07-17 22:38:06 +02:00
mach-mxs ARM: mxs: fix compile error caused by prom_update_property change 2012-07-25 22:36:39 -07:00
mach-netx arch/arm/mach-netx/fb.c: reuse dummy clk routines for CONFIG_HAVE_CLK=n 2012-07-30 17:25:13 -07:00
mach-nomadik ARM: nomadik: bump all IRQ numbers by one 2012-06-11 12:40:14 +02:00
mach-omap1 Merge branch 'dmaengine' of git://git.linaro.org/people/rmk/linux-arm 2012-08-01 16:41:07 -07:00
mach-omap2 ARM: arm-soc: cpuidle enablement for OMAP 2012-08-02 11:48:54 -07:00
mach-orion5x ARM: Orion: DT support for IRQ and GPIO Controllers 2012-07-27 16:48:14 +02:00
mach-picoxcell clocksource: dw_apb_timer: Add common DTS glue for dw_apb_timer 2012-07-12 17:26:09 +02:00
mach-pnx4008 arm-soc: sweeping late_initcall cleanup 2012-05-26 13:14:01 -07:00
mach-prima2 ARM: PRIMA2: delete redundant codes to restore LATCHED when timer resumes 2012-08-02 10:05:27 -07:00
mach-pxa This patch series contains a major revamp of how we collect entropy 2012-07-31 19:07:42 -07:00
mach-realview
mach-rpc ARM: fiq: change FIQ_START to a variable 2012-07-01 21:59:19 +08:00
mach-s3c24xx arm-soc: device tree description updates 2012-07-23 16:17:43 -07:00
mach-s3c64xx ARM: arm-soc soc updates, take 2 2012-07-30 09:45:53 -07:00
mach-s3c2410
mach-s3c2412
mach-s3c2440
mach-s5p64x0 arm-soc: device tree description updates 2012-07-23 16:17:43 -07:00
mach-s5pc100 arm-soc: device tree description updates 2012-07-23 16:17:43 -07:00
mach-s5pv210 arm-soc: board specific updates 2012-07-23 17:34:48 -07:00
mach-sa1100 Merge branches 'audit', 'delay', 'fixes', 'misc' and 'sta2x11' into for-linus 2012-07-27 23:06:32 +01:00
mach-shark ARM: PCI: provide a default bus scan implementation 2012-05-13 17:12:17 +01:00
mach-shmobile ARM: arm-soc board updates, take 2 2012-07-30 09:48:00 -07:00
mach-socfpga ARM: socfpga: initial support for Altera's SOCFPGA platform 2012-07-19 10:39:00 +02:00
mach-spear3xx Merge branch 'dmaengine' of git://git.linaro.org/people/rmk/linux-arm 2012-08-01 16:41:07 -07:00
mach-spear6xx Merge branch 'dmaengine' of git://git.linaro.org/people/rmk/linux-arm 2012-08-01 16:41:07 -07:00
mach-spear13xx Viresh has moved 2012-06-20 14:39:36 -07:00
mach-tegra Merge branch 'for-3.6' of git://gitorious.org/linux-pwm/linux-pwm 2012-07-30 09:22:37 -07:00
mach-u300 ARM: u300: convert to common clock 2012-07-11 15:36:45 -07:00
mach-ux500 MFD bits for the 3.6 merge window. 2012-07-30 12:41:17 -07:00
mach-versatile ARM: fix mach-versatile/pci.c warning 2012-07-04 17:04:57 +01:00
mach-vexpress ARM: vexpress: Config option for early printk console 2012-07-13 11:48:29 +01:00
mach-vt8500 Merge branch 'for-3.6' of git://gitorious.org/linux-pwm/linux-pwm 2012-07-30 09:22:37 -07:00
mach-w90x900
mach-zynq
mm ARM: Allow arm_memblock_steal() to remove memory from any RAM region 2012-08-13 00:22:28 +01:00
net ARM: 7421/1: bpf_jit: BPF_S_ANC_ALU_XOR_X support 2012-06-14 15:12:13 +01:00
nwfpe
oprofile ARM: 7448/1: perf: remove arm_perf_pmu_ids global enumeration 2012-07-09 17:41:10 +01:00
plat-iop ARM: PCI: provide a default bus scan implementation 2012-05-13 17:12:17 +01:00
plat-mxc ARM: SoC fixes 2012-08-02 11:48:20 -07:00
plat-nomadik i2c-nomadik: move header to <linux/platform_data/i2c-nomadik.h> 2012-07-09 11:40:40 +02:00
plat-omap Merge branch 'dmaengine' of git://git.linaro.org/people/rmk/linux-arm 2012-08-01 16:41:07 -07:00
plat-orion ARM: Orion: Add arch support needed for I2C via DT. 2012-07-27 16:48:29 +02:00
plat-pxa Merge branch 'for-3.6' of git://gitorious.org/linux-pwm/linux-pwm 2012-07-30 09:22:37 -07:00
plat-s3c24xx ARM: fiq: change FIQ_START to a variable 2012-07-01 21:59:19 +08:00
plat-samsung Merge branch 'for-3.6' of git://gitorious.org/linux-pwm/linux-pwm 2012-07-30 09:22:37 -07:00
plat-spear Merge branch 'dmaengine' of git://git.linaro.org/people/rmk/linux-arm 2012-08-01 16:41:07 -07:00
plat-versatile Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm 2012-07-27 15:14:26 -07:00
tools ARM: Update mach-types 2012-04-26 08:46:02 +01:00
vfp Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm 2012-08-01 16:30:45 -07:00
Kconfig ARM: arm-soc Marvell Orion device-tree updates 2012-08-02 11:50:24 -07:00
Kconfig-nommu
Kconfig.debug Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm 2012-07-27 15:14:26 -07:00
Makefile Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm 2012-07-27 15:14:26 -07:00