Parallel execution can exist at many different levels, e.g. at processor level you can have multiple processors executing in parallel some threads.
At instruction level, the execution of instructions is an iterative process, so like for any other iterative process parallelism or pipelining or both parallelism and pipelining may be used.
Most modern CPUs use both parallelism and pipelining in the execution of instructions. If you have e.g. a stream of multiply instructions, the CPU may have for example 2 multiply pipelines, where each multiply pipeline has 4 stages. The incoming multiply instruction stream is divided into 2 substreams, which are dispatched in parallel to the 2 multiply pipelines, so 2 multiply instructions are initiated in each clock cycle, which makes the example CPU superscalar. The multiply instructions are completed after 4 clock cycles, which is their latency, when they exit the pipeline. Thus, in the example CPU, 8 multiply instructions are simultaneously executed in each clock cycle, in various stages of the 2 parallel pipelines.
Whenever you have independent iterations, e.g. what looks like a "for" loop in the source program, where there are no dependencies between distinct executions of the loop body, the iterations can be executed in 4 different ways, sequentially, interleaved, pipelined or in parallel.
The last 3 ways can provide an acceleration in comparison with the sequential execution of the iteration. In the last 2 ways there is simultaneous execution of multiple iterations or multiple parts of an iteration.
For parallel execution of the iteration (like in OpenMP "parallel for" or like in NVIDIA CUDA), a thread must be created for each iteration execution and all threads are launched to be executed in parallel by multiple hardware execution units.
For pipelined execution of the iteration, the iteration body is partitioned in multiple consecutive blocks that use as input data the output data of the previous block (which may need adding additional storage variables, to separate output from input for each block), then a thread must be created for each such block that implements a part of the iteration, and then all such threads are launched to be executed in parallel by multiple hardware execution units.
These 2 ways of organizing simultaneous work, pipelined execution and parallel execution are applicable to any kind of iterative process. Two of the most important such iterative processes are the execution of a stream of instructions and the implementation of an array operation, which performs some operation on all the elements of an array. In both cases one can use a combination of pipelining and parallelism to achieve maximum speed. For these 2 cases, sometimes the terms "instruction-level parallelism and pipelining" and "data-level parallelism and pipelining" are used. The second of these 2 terms is misleading, because not data are executed in parallel or pipelined, but the iterations that process data are executed in parallel or pipelined. For any case where pipelining may be used, it is important to recognize which is the iterative process that can be implemented in this way. For unrelated tasks a.k.a. processes a.k.a. threads, only 3 ways of execution are available: sequential, interleaved and in parallel. The 4th way of execution, pipelined, is available only for iterations, where the difference between iterations and unrelated tasks is that each iteration executes the same program (i.e. the loop body, when the iteration is written with the sequential loop syntax).
You are right, but this can be make more precise: pipelining is a specific form of parallelism. After all the different stages of the pipeline are executing in parallel.
You use "parallelism" with a different meaning than me.
I use "parallel" with its original meaning "one besides the other", i.e. for spatial parallelism.
You use "parallel" with the meaning "simultaneous in time", because only with this meaning you can call pipelining as a form of parallelism.
You are not the only one who uses "parallel" for "simultaneous in time", but in my opinion this is a usage that must be discouraged, because it is not useful.
If you use "parallel" with your meaning, you must find new words to distinguish parallelism in space from parallelism in time. There are no such words in widespread use, so the best that you can do is to say "parallel in space" and "parallel in time", which is too cumbersome.
It is much more convenient to use "parallel" only with its original meaning, for parallelism in space, which in the case of parallel execution requires multiple equivalent execution units (unlike pipelined execution, which in most cases uses multiple non-equivalent execution units).
When "parallel" is restricted to parallelism in space, pipelining is not a form of parallelism. Both for pipelining and for parallelism there are multiple execution units that work simultaneously in time, but the stream of data passes in parallel through the parallel execution units and in series through the pipelined execution units.
With this meaning of "parallel", one can speak about "parallel execution" and "pipelined execution" without any ambiguity. It is extremely frequent to have the need to discuss about both "parallel execution" and "pipelined execution" in the same context or even in the same sentence, because these 2 techniques are normally combined in various ways.
When "parallel" is used for simultaneity in time it becomes hard to distinguish parallel in space execution from pipelined execution.
The pipeline stages (say: fetch, decode, execute, memory access, register write back), are organised "parallel in space" as transistors on chip. The point of having a pipeline is so the stages can execute "parallel in time".
More generally, parallel in space is interesting because it is a necessary precondition for parallel in time.
In its original meaning, which is still the meaning used in mathematics and physics, "parallel" provides more information than just saying that the parallel things are located in different places in space. Such an information can be provided by other words.
One should not say that the pipeline stages are parallel, when the intended meaning is that they are separate or distinct, which is the correct precondition for their ability to work simultaneous in time.
"Parallel" says about two things that they are located side-by-side, with their front-to-back axes aligned, which is true for parallel execution units where the executions of multiple operations are initiated simultaneously in all subunits, but it is false for pipelined execution units, where the executions of multiple operations are initiated sequentially, both in the first stage and in all following stages, but the initiation of an execution is done before the completion of the previous execution, leading to executions that are overlapped in time.
The difference between parallel execution and pipelined execution is the same as between parallel connections and series connections in any kind of networks, e.g. electrical circuits or networks describing fluid flow.
Therefore it is better if the terms used in computing remain consistent with the terms used in mathematics, physics and engineering, which have already been used for centuries before the creation of the computing terminology.
At instruction level, the execution of instructions is an iterative process, so like for any other iterative process parallelism or pipelining or both parallelism and pipelining may be used.
Most modern CPUs use both parallelism and pipelining in the execution of instructions. If you have e.g. a stream of multiply instructions, the CPU may have for example 2 multiply pipelines, where each multiply pipeline has 4 stages. The incoming multiply instruction stream is divided into 2 substreams, which are dispatched in parallel to the 2 multiply pipelines, so 2 multiply instructions are initiated in each clock cycle, which makes the example CPU superscalar. The multiply instructions are completed after 4 clock cycles, which is their latency, when they exit the pipeline. Thus, in the example CPU, 8 multiply instructions are simultaneously executed in each clock cycle, in various stages of the 2 parallel pipelines.
Whenever you have independent iterations, e.g. what looks like a "for" loop in the source program, where there are no dependencies between distinct executions of the loop body, the iterations can be executed in 4 different ways, sequentially, interleaved, pipelined or in parallel.
The last 3 ways can provide an acceleration in comparison with the sequential execution of the iteration. In the last 2 ways there is simultaneous execution of multiple iterations or multiple parts of an iteration.
For parallel execution of the iteration (like in OpenMP "parallel for" or like in NVIDIA CUDA), a thread must be created for each iteration execution and all threads are launched to be executed in parallel by multiple hardware execution units.
For pipelined execution of the iteration, the iteration body is partitioned in multiple consecutive blocks that use as input data the output data of the previous block (which may need adding additional storage variables, to separate output from input for each block), then a thread must be created for each such block that implements a part of the iteration, and then all such threads are launched to be executed in parallel by multiple hardware execution units.
These 2 ways of organizing simultaneous work, pipelined execution and parallel execution are applicable to any kind of iterative process. Two of the most important such iterative processes are the execution of a stream of instructions and the implementation of an array operation, which performs some operation on all the elements of an array. In both cases one can use a combination of pipelining and parallelism to achieve maximum speed. For these 2 cases, sometimes the terms "instruction-level parallelism and pipelining" and "data-level parallelism and pipelining" are used. The second of these 2 terms is misleading, because not data are executed in parallel or pipelined, but the iterations that process data are executed in parallel or pipelined. For any case where pipelining may be used, it is important to recognize which is the iterative process that can be implemented in this way. For unrelated tasks a.k.a. processes a.k.a. threads, only 3 ways of execution are available: sequential, interleaved and in parallel. The 4th way of execution, pipelined, is available only for iterations, where the difference between iterations and unrelated tasks is that each iteration executes the same program (i.e. the loop body, when the iteration is written with the sequential loop syntax).