In this paper, we revisited the parallel implementation of SIMON and SPECK block ciphers. The performances of SIMON and SPECK are significantly improved by using ARM NEON SIMD (Single Instruction Multiple Data) parallel computing and OpenMP SIMT (Single Instruction Multiple Thread). We optimized the implementation on ARM NEON architecture. For optimized NEON, we reduced the number of registers for round key and increased the number of registers for plaintexts. Furthermore, we proposed the efficient forward and backward alignment methods. Finally, we maximize the performance by using SIMT (Single Instruction Multiple Threads). In the case of performance of proposed methods and proposed methods with SIMT, SIMON 128/128 encryption within 32.4, 14.3 cycles/byte, SIMON 128/192 encryption within 30.1, 15.9 cycles/byte, SIMON 128/256 encryption within 32.4, 16.9 cycles/byte, SPECK 128/128 encryption within 9.7, 5.1 cycles/byte, SPECK 128/192 encryption within 10.4, 5.6 cycles/byte, SPECK 128/256 encryption within 11.0, and 5.6 cycles/byte respectively on ARM Cortex-A53 environment.