Home Random Page


CATEGORIES:

BiologyChemistryConstructionCultureEcologyEconomyElectronicsFinanceGeographyHistoryInformaticsLawMathematicsMechanicsMedicineOtherPedagogyPhilosophyPhysicsPolicyPsychologySociologySportTourism






Nbsp;   Parallel Language Integrated Query

Microsoft’s Language Integrated Query (LINQ) feature offers a convenient syntax for performing queries over collections of data. Using LINQ, you can easily filter items, sort items, return a projected set of items, and much more. When you use LINQ to Objects, only one thread processes all the items in your data collection sequentially; we call this a sequential query. You can potentially improve the performance of this processing by using Parallel LINQ, which can turn your sequential query into

a parallel query, which internally uses tasks (queued to the default TaskScheduler) to spread the processing of the collection’s items across multiple CPUs so that multiple items are processed concur- rently. Like Parallel’s methods, you will get the most benefit from Parallel LINQ if you have many items to process or if the processing of each item is a lengthy compute-bound operation.

The static System.Linq.ParallelEnumerable class (defined in System.Core.dll) imple- ments all of the Parallel LINQ functionality, and so you must import the System.Linq namespace into your source code via C#’s using directive. In particular, this class exposes parallel versions of all the standard LINQ operators such as Where, Select, SelectMany, GroupBy, Join, OrderBy,

Skip, Take, and so on. All of these methods are extension methods that extend the System.Linq. ParallelQuery<T> type. To have your LINQ to Objects query invoke the parallel versions of these methods, you must convert your sequential query (based on IEnumerable or IEnumerable<T>) to a parallel query (based on ParallelQuery or ParallelQuery<T>) using ParallelEnumerable’s AsParallel extension method, which looks like this.3

 

public static ParallelQuery<TSource> AsParallel<TSource>(this IEnumerable<TSource> source) public static ParallelQuery AsParallel(this IEnumerable source)

 

Here is an example of a sequential query that has been converted to a parallel query. This query

returns all the obsolete methods defined within an assembly.

 

private static void ObsoleteMethods(Assembly assembly) { var query =

from type in assembly.GetExportedTypes().AsParallel()

 

from method in type.GetMethods(BindingFlags.Public | BindingFlags.Instance | BindingFlags.Static)

 

let obsoleteAttrType = typeof(ObsoleteAttribute)

 

 
 

3 The ParallelQuery<T> class is derived from the ParallelQuery class.


where Attribute.IsDefined(method, obsoleteAttrType) orderby type.FullName

let obsoleteAttrObj = (ObsoleteAttribute) Attribute.GetCustomAttribute(method, obsoleteAttrType)

 

select String.Format("Type={0}\nMethod={1}\nMessage={2}\n", type.FullName, method.ToString(), obsoleteAttrObj.Message);

 

// Display the results

foreach (var result in query) Console.WriteLine(result);

}

 

Although uncommon, within a query you can switch from performing parallel operations back to performing sequential operations by calling ParallelEnumerable’s AsSequential method.



 

public static IEnumerable<TSource> AsSequential<TSource>(this ParallelQuery<TSource> source)

 

This method basically turns a ParallelQuery<T> back to an IEnumerable<T> so that operations performed after calling AsSequential are performed by just one thread.

Normally, the resulting data produced by a LINQ query is evaluated by having some thread execute a foreach statement (as shown earlier). This means that just one thread iterates over all the query’s results. If you want to have the query’s results processed in parallel, then you should process the resulting query by using ParallelEnumerable’s ForAll method.

 

static void ForAll<TSource>(this ParallelQuery<TSource> source, Action<TSource> action)

 

This method allows multiple threads to process the results simultaneously. I could modify my code earlier to use this method as follows.

 

// Display the results query.ForAll(Console.WriteLine);

 

However, having multiple threads call Console.WriteLine simultaneously actually hurts perfor- mance, because the Console class internally synchronizes threads, ensuring that only one at a time can access the console window. This prevents text from multiple threads from being interspersed, making the output unintelligible. Use the ForAll method when you intend to perform calculations on each result.

Because Parallel LINQ processes items by using multiple threads, the items are processed con- currently and the results are returned in an unordered fashion. If you need to have Parallel LINQ preserve the order of items as they are processed, then you can call ParallelEnumerable’s As­ Ordered method. When you call this method, threads will process items in groups and then the groups are merged back together, preserving the order; this will hurt performance. The following operators produce unordered operations: Distinct, Except, Intersect, Union, Join, GroupBy,


GroupJoin, and ToLookup. If you want to enforce ordering again after one of these operators, just call the AsOrdered method.

The following operators produce ordered operations: OrderBy, OrderByDescending, ThenBy, and ThenByDescending. If you want to go back to unordered processing again to improve perfor- mance after one of these operators, just call the AsUnordered method.

Parallel LINQ offers some additional ParallelEnumerable methods that you can call to control how the query is processed.

 

public static ParallelQuery<TSource> WithCancellation<TSource>(

this ParallelQuery<TSource> source, CancellationTokencancellationToken)

 

public static ParallelQuery<TSource> WithDegreeOfParallelism<TSource>( this ParallelQuery<TSource> source, Int32degreeOfParallelism)

 

public static ParallelQuery<TSource> WithExecutionMode<TSource>(

this ParallelQuery<TSource> source, ParallelExecutionModeexecutionMode)

 

public static ParallelQuery<TSource> WithMergeOptions<TSource>(

this ParallelQuery<TSource> source, ParallelMergeOptionsmergeOptions)

 

Obviously, the WithCancellation method allows you to pass a CancellationToken so that the query processing can be stopped prematurely. The WithDegreeOfParallelism method speci- fies the maximum number of threads allowed to process the query; it does not force the threads to be created if not all of them are necessary. Usually you will not call this method, and, by default, the query will execute using one thread per core. However, you could call WIthDegreeOfParallelism, passing a number that is smaller than the number of available cores if you want to keep some cores

available for doing other work. You could also pass a number that is greater than the number of cores if the query performs synchronous I/O operations because threads will be blocking during these op- erations. This wastes more threads but can produce the final result in less time. You might consider doing this in a client application, but I’d highly recommend against performing synchronous I/O operations in a server application.

Parallel LINQ analyzes a query and then decides how to best process it. Sometimes processing a query sequentially yields better performance. This is usually true when using any of these operations: Concat, ElementAt(OrDefault), First(OrDefault), Last(OrDefault), Skip(While), Take(While), or Zip. It is also true when using overloads of Select(Many) or Where that pass a position index into your selector or predicate delegate. However, you can force a query to be processed in parallel by calling WithExecutionMode, passing it one of the ParallelExecutionMode flags.

 

public enum ParallelExecutionMode {

Default = 0, // Let Parallel LINQ decide to best process the query ForceParallelism = 1 // Force the query to be processed in parallel

}


As mentioned before, Parallel LINQ has multiple threads processing items, and then the results must be merged back together. You can control how the items are buffered and merged by calling WithMergeOptions, passing it one of the ParallelMergeOptions flags.

 

public enum ParallelMergeOptions {

Default = 0, // Same as AutoBuffered today (could change in the future) NotBuffered = 1, // Results are processed as ready

AutoBuffered = 2, // Each thread buffers some results before processed FullyBuffered = 3 // Each thread buffers all results before processed

}

 

These options basically give you some control over speed versus memory consumption. Not­ Buffered saves memory but processes items slower. FullyBuffered consumes more memory while running fastest. AutoBuffered is the compromise in between NotBuffered and FullyBuffered. Really, the best way to know which of these to choose for any given query is to try them all and com- pare their performance results, or just accept the default, which tends to work pretty well for many queries. See the following blog posts for more information about how Parallel LINQ partitions work across CPU cores:

http://blogs.msdn.com/pfxteam/archive/2009/05/28/9648672.aspx

http://blogs.msdn.com/pfxteam/archive/2009/06/13/9741072.aspx


Date: 2016-03-03; view: 860


<== previous page | next page ==>
Nbsp;   Parallel’s Static For, ForEach, and Invoke Methods | Nbsp;   Performing a Periodic Compute-Bound Operation
doclecture.net - lectures - 2014-2025 year. Copyright infringement or personal data (0.008 sec.)