Main Page | Recent changes | View source | Page history

Printable version | Disclaimers | Privacy policy

Not logged in
Log in | Help
 

CUDA Parallel Computing

From OCAU Wiki

Based on Benny11's thread Most Efficient CUDA Processing per Dollar!


Contents

1.1 Introduction

For the past couple of months I have been trying to find the most cost effective CUDA solution. Currently I'm looking at creating a small Bewoulf cluster for CUDA algorithm processing on linux, but after looking around the net I could never find a comprehensive benchmark list of different GPU's using CUDA. I'm not sure what GPU to buy and I don't think many people who are interested in the scene know either. Sure NVIDIA made a GPU just for CUDA but who wants to pay $1400 for it? So i thought I would create this thread and help everyone get interested in CUDA and find some effective ways for a CUDA solution...


1.2 What is CUDA?

NVIDIA CUDA (Compute Unified Device Architecture) technology is the world's only C language environment that enables programmers and developers to write software to solve complex computational problems in a fraction of the time by tapping into the many-core parallel processing power of GPUs. With millions of CUDA-capable GPUs already deployed, thousands of software programmers are already using the free CUDA software tools to accelerate applications-from video and audio encoding to oil and gas exploration, product design, medical imaging, and scientific research. Basically CUDA is a software and GPU architecture that makes it possible to use the many processor cores (and eventually thousands of cores) in a GPU to perform general-purpose mathematical calculations.


1.3 But isn't parallel computing difficult?

Parallel programming is difficult because it was typically defined as making many CPUs work together (as in a cluster). Desktop applications have been slow to take advantage of multi-core CPUs due to the difficulty of splitting a single program into one that works across multiple threads. These difficulties arise from the fact that a CPU is inherently a serial processor and having multiple CPUs require complex software to manage them.
CUDA removes much of the burden of manually managing parallelism. A program written in CUDA is actually a serial program called a kernel. The GPU takes this kernel and makes it parallel by launching thousands of instances of the program. Since CUDA is an extension of C, it's often trivial to port programs to CUDA. It can be as simple as converting a loop into a CUDA call.

The key features of CUDA are:

  • Shared memory: Every multiprocessor in CUDA-capable GPUs contains 16 KB of shared memory. This allows different threads to communicate with each other and share data. Shared memory can be considered as software managed cache, which provides great speedups by conserving bandwidth to main memory. This benefits a number of common applications such as linear algebra, fast Fourier transforms, and image-processing filters.
  • Random read and write (ie. gather and scatter): Whereas fragment programs in the graphics API are limited to outputting 32 floats (RGBA * 8 render targets) at a pre-specified memory location, CUDA supports scattered writes, i.e., an unlimited number of stores to any memory address. This enables many new algorithms that are not feasible using a graphics API.
  • Arrays and integer addressing: Graphics APIs force the user to store data as textures, which requires packing long arrays into 2D textures. This is cumbersome and imposes extra addressing math. CUDA allows data to be stored in standard arrays and can perform loads from any address.
  • Texturing Support: CUDA provides optimized texture access with automatic caching, free filtering and integer addressing.
  • Coalesced memory loads and stores: CUDA groups multiple memory load requests or multiple store requests together, effectively reading or writing data from memory in chunks, allowing near-peak use of memory bandwidth.


1.4 How much faster is it? The GPU vs. CPU Architecture....

Suppose we have two arrays of 1,000 elements and want to find the sum of their elements. The CPU program would iteratively step through the two arrays, finding the sum at each point. For 1,000 elements, it takes 1,000 iterations to execute.
On a GPU, the program is defined as a sum operation over the two arrays. When the GPU executes the program, it generates an instance of the sum program for every element in the array. For an array of 1000 elements, it creates and launches 1000 "sum threads." A GeForce GTX 280 has 240 cores, allowing 240 threads to be calculated per clock. For 1000 elements, the GeForce GTX 280 finishes execution in five cycles.


1.5 Before Benchmarking - A little theory:

In computing a floating point describes a system for representing numbers that would be too large or too small to be represented as integers. Numbers are in general represented approximately to a fixed number of significant digits and scaled using an exponent.
The term floating point refers to the fact that the radix point (decimal point, or, more commonly in computers, binary point) can "float"; that is, it can be placed anywhere relative to the significant digits of the number. This position is indicated separately in the internal representation, and floating-point representation can thus be thought of as a computer realization of scientific notation.

The advantage of floating-point representation over fixed-point (and integer) representation is that it can support a much wider range of values. For example, a fixed-point representation that has seven decimal digits, with the decimal point assumed to be positioned after the fifth digit, can represent the numbers 12345.67, 8765.43, 123.00, and so on, whereas a floating-point representation with seven decimal digits could in addition represent 1.234567, 123456.7, 0.00001234567, 1234567000000000, and so on. The floating-point format needs slightly more storage (to encode the position of the radix point), so when stored in the same space, floating-point numbers achieve their greater range at the expense of precision. Historically, different bases have been used for representing floating-point numbers, with base 2 (binary) being the most common, followed by base 10 (decimal), and other less common varieties such as base 16 (hexadecimal notation).

The speed of floating-point operations is measured in FLOPS. FLOPS (or flops or flop/s) is an acronym meaning FLoating point Operations Per Second. The FLOPS is a measure of a computer's performance, especially in fields of scientific calculations that make heavy use of floating point calculations.

Just to help put this into context (and an interesting fact):
The world's fastest supercomputer as of November 2009 is the Cray XT5, also known as Jaguar—beating Roadrunner, which held the number one position for 18 months. Jaguar recently upgraded its quad-core CPUs to hex-core Opteron processors, which meant a 2.3 petaflop per second theoretical performance peak (”nearly a quarter of a million cores”), and 1.75 petaFLOPS measured by the Linpack benchmark. For comparison, a hand-held calculator must perform relatively few FLOPS. Each calculation request, such as to add or subtract two numbers, requires only a single operation, so there is rarely any need for its response time to exceed what the operator can physically use. A computer response time below 0.1 second in a calculation context is usually perceived as instantaneous by a human operator, so a simple calculator needs only about 10 FLOPS to be functional.


1.6 Benchmarking Methods:

After hours of searching for methods to benchmark CUDA on a specific setup i came to no definitive method. However for the purpose of this test I have chosen to use two simple methods (one a little more complex than the other but still simple) for benchmarking CUDA.

1.6.1 Requirements:

1.6.2 Method 1:

The first and easiest method is to use a simple program called CUDA-Z (Donwload from http://sourceforge.net/projects/cuda-z/files/cuda-z/0.5/CUDA-Z-0.5.95.exe/download). Run this little program and navigate to the “Performance” tab and click export > to text. Copy the information into your post. MAKE SURE YOU DO NOT HAVE ANY GPU INTENSIVE PROGRAMS ALREADY RUNNING.... IT IS BEST TO CLOSE ALL PROGRAMS BEFORE EXPORTING AS GPU INTENSIVE PROGRAMS WILL CHANGE THE RESULTS!!!!

1.6.3 Method 2:

Being re-thought...... It's to buggy...

1.6.4 Benchmark Submission Post Format:

I will give an example below if you are unsure!

  • Operating System:
  • CPU:
  • Driver Version: (Is a must Please! - Go to 1.6.5 if you are unsure how to find it.)
  • Core/Shader/Memory [C/S/M]: PLEASE! If your unsure... Download GPU-Z
  • Current Average Price of GPU: (if you can find it)(www.staticice.com.au)
  • Method 1 output:
  • Method 2 When updated.....

And of course.... this is OCAU..... Please specify any overclocks with CPU or GPU. If you can and are willing too, overclock your GPU.... Do the tests again and repost your results inc. specs of your overclock.

1.6.5 Finding your Driver Version

  • Method 1: Click run and type in dxdiag then click OK. A popup will come up; click the display tab and to the right under drivers note the version.
  • Method 2: If method 1 did not work go to device manager (search for it in vista or go to control panel click advanced view and double click device manager). Click the + on display adapters. Right click your GPU and click properties. Once Properties is up, click the driver tab and note the Driver Version.


1.7 Conclusion:

In conclusion this post should help to find the most cost effective CUDA solution bringing the OCAU rep up and saving people money who are interested in creating CUDA systems...


1.8 Notes and Improvements to Thread:

  • OP is not finished.... Need to add a bit more and simplify some things
  • A few more things that I have forgot for the moment...
  • Add some pictures...


1.9 References:

  • Still need to add

Thanks to anyone who can help...

Any post welcome...


2.1 How the results are calculated...

At the moment I am still yet to find a true CUDA benchmarking tool for Windows... (I am doing the tests on Windows because I'm sure at least 99% of us here with an NVIDIA GPU will be running at least one Microsoft OS). Some of the tests I have come up with have proven to be very buggy and unstable and for that reason I will not be using them...

Until I can come up with or find something for a true CUDA benchmark I will only be using the methods above. Although they may not be the best indication; the results they produce are a practical result and not that of a theoretical result from the NVIDIA website.

At the moment all results will be recorded as MFLOPS/$ (Mega FLOPS per Dollar). This can be calculated by deviding the Single-precision Float result (Using Single-Precision as it only occupies 4 bytes and should be supported by every CUDA Enabled GPU) by the average Price of the GPU..

Units:

  • MSPFLOPS/$: Mega Single-Precision Floating-point Operations Per Second per Dollar
  • MDPFLOPS/$: Mega Double-Precision Floating-point Operations Per Second per Dollar


2.2 Example: (Please post in full in thread link at top)

Operating System: Windows Vista Home Premium 32bit
CPU: AMD Turion x2 Mobile @ 2.0Ghz
Device Driver: 8.16.11.8766 (Modded inf to replace out of date HP Drivers)
C/S/M: 400/400/800

Core Information
Name: GeForce 8400M GS
Compute Capability: 1.1
Clock Rate: 800 MHz
Multiprocessors: 2
Warp Size: 32
Regs Per Block: 8192
Threads Per Block: 512
Watchdog Enabled: No
Threads Dimentions: 512 x 512 x 64
Grid Dimentions: 65535 x 65535 x 1

Memory Information
Total Global: 256 MB
Shared Per Block: 16 KB
Pitch: 256 KB
Total Constant: 64 KB
Texture Alignment: 256
GPU Overlap: Yes

Performance Information
Memory Copy
Host Pinned to Device: 2597.04 MB/s
Host Pageable to Device: 1070.09 MB/s
Device to Host Pinned: 2620.34 MB/s
Device to Host Pageable: 1153.97 MB/s
Device to Device: 2082.37 MB/s
GPU Core Performance
Single-precision Float: 25438.3 Mflop/s
Double-precision Float: Not Supported
32-bit Integer: 5095.79 Miop/s
24-bit Integer: 25434.9 Miop/s


2.3 Results:

PLEASE CORRECT ME IF ANY PRICES ARE WRONG! THE PRICE IS A VITAL PART HOWEVER FOR SOME ITEMS I CANNOT FIND THE PRICE! I AM RELYING ON EVERYONE FOR PRICES!


Submitted by nirvana_1911 using GeForce GTX295 (OVERCLOCKED - [C/S/M] [650/1401/1025])

  • Operating System: Windows 7 64bit
  • CPU: i7 920 @ 3.8Ghz
  • Device Driver: Unknown
  • GPU Memory: 896 MB
  • Average cost: $598
  • Single-precision Float: 670642 Mflop/s
  • Double-precision Float: 84191.1 Mflop/s
  • MSPFLOPS/$: 1121.48
  • MDPFLOPS/$: 140.79

Submitted by Trdboy using GeForce GTX295 (ALL STOCK - [C/S/M] [576/1242/2000])

  • Operating System: Windows 7 64bit
  • CPU: i5 750 @ 3.8Ghz
  • Device Driver: Unknown
  • GPU Memory: 896 MB
  • Average cost: $598
  • Single-precision Float: 593499 Mflop/s
  • Double-precision Float: 74480.9 Mflop/s
  • MSPFLOPS/$: 992.47
  • MDPFLOPS/$: 124.55

Submitted by SoundEngine355 using GeForce GTX295 (ALL STOCK - [C/S/M] [-/-/-])

  • Operating System: Windows 7 64bit
  • CPU: i7 920 @ 4.0Ghz
  • Device Driver: Unknown
  • GPU Memory: 896 MB
  • Average cost: $598
  • Single-precision Float: 593479 Mflop/s
  • Double-precision Float: 74480.7 Mflop/s
  • MSPFLOPS/$: 992.44
  • MDPFLOPS/$: 124.55

Submitted by Checkz using GeForce GTX285 (OVERCLOCKED - [C/S/M] [675/1548/1269])

  • Operating System: Windows Vista 64bit
  • CPU: i7 920 @ 2.66Ghz
  • Device Driver: 196.21
  • GPU Memory: 1024 MB
  • Average cost: $442
  • Single-precision Float: 739681 Mflop/s
  • Double-precision Float: 90824.6 Mflop/s
  • MSPFLOPS/$: 1673.49
  • MDPFLOPS/$: 205.46

Submitted by // BiZ using GeForce GTX285 (ALL STOCK - [C/S/M] [-/-/-])

  • Operating System: Windows Vista 64bit
  • CPU: E8400 @ 4.0Ghz
  • Device Driver: Unknown
  • GPU Memory: 2048 MB
  • Average cost: $535
  • Single-precision Float: 722830 Mflop/s
  • Double-precision Float: 87999.7 Mflop/s
  • MSPFLOPS/$: 1351.08
  • MDPFLOPS/$: 164.49

Submitted by Dave2972 using GeForce GTX285 (ALL STOCK - [C/S/M] [-/-/-])

  • Operating System: Windows 7 64bit
  • CPU: x4 955 @ 3.2Ghz
  • Device Driver: 190.62
  • GPU Memory: 1024 MB
  • Average cost: $442
  • Single-precision Float: 705320 Mflop/s
  • Double-precision Float: 85465.4 Mflop/s
  • MSPFLOPS/$: 1595.75
  • MDPFLOPS/$: 193.36

Submitted by Karlcloudy using GeForce GTX275 (OVERCLOCKED - [C/S/M] [715/1550/1260])

  • Operating System: Windows 7 64bit
  • CPU: Q9550 @ 3.74Ghz
  • Device Driver: Unknown
  • GPU Memory: 896 MB
  • Average cost: $270
  • Single-precision Float: 739986 Mflop/s
  • Double-precision Float: 90718.4 Mflop/s
  • MSPFLOPS/$: 2740.69
  • MDPFLOPS/$: 335.99

Submitted by LoL using GeForce GTX275 (OVERCLOCKED - [C/S/M] [633/1545/2268])

  • Operating System: Windows 7 64bit
  • CPU: Q9650 @ 3.80Ghz
  • Device Driver: 196.34
  • GPU Memory: 896 MB
  • Average cost: $270
  • Single-precision Float: 739944 Mflop/s
  • Double-precision Float: 90798.2 Mflop/s
  • MSPFLOPS/$: 2740.53
  • MDPFLOPS/$: 336.29

Submitted by Karlcloudy using GeForce GTX275 (OVERCLOCKED - [C/S/M] [660/1460/1200])

  • Operating System: Windows 7 64bit
  • CPU: Q9550 @ 3.74Ghz
  • Device Driver: Unknown
  • GPU Memory: 896 MB
  • Average cost: $270
  • Single-precision Float: 696900 Mflop/s
  • Double-precision Float: 84482.8 Mflop/s
  • MSPFLOPS/$: 2581.11
  • MDPFLOPS/$: 312.90

Submitted by hate-xfiles using GigaByte GTX260 (OVERCLOCKED - [C/S/M] [750/1587/1320])

  • Operating System: Unknown
  • CPU: Q9550 @ 3.75Ghz
  • Device Driver: Unknown
  • GPU Memory: 896 MB
  • Average cost: $260
  • Single-precision Float: 343637 Mflop/s (Weird Result - Should be expecting about Double)
  • Double-precision Float: 42126 Mflop/s
  • MSPFLOPS/$: 1321.7
  • MDPFLOPS/$: 162

Submitted by kilebantick using GeForce GTX260 (OVERCLOCKED - [C/S/M] [680/1500/2500])

  • Operating System: Windows 7 64bit
  • CPU: E5200 @ 2.55Ghz
  • Device Driver: 8.17.11.9562
  • GPU Memory: 896 MB
  • Average cost: $255
  • Single-precision Float: 603891 Mflop/s
  • Double-precision Float: 74648.1 Mflop/s
  • MSPFLOPS/$: 2368.2
  • MDPFLOPS/$: 292.74

Submitted by cevtech using Gainward GTX260 (ALL STOCK - [C/S/M] [-/-/-])

  • Operating System: Windows 7 (32bit ??)
  • CPU: i7 860
  • Device Driver: Unknown
  • GPU Memory: 896 MB
  • Average cost: $250
  • Single-precision Float: 580631 Mflop/s
  • Double-precision Float: 71189.4 Mflop/s
  • MSPFLOPS/$: 2322.5
  • MDPFLOPS/$: 284.8

Submitted by anlashok using GeForce GTS 250 (ALL STOCK - [C/S/M] [738/1836/2200])

  • Operating System: Win XP
  • CPU: i5 750 @ 3.5Ghz
  • Device Driver: 191.07 (6.14)
  • GPU Memory: 512 MB
  • Average cost: $125
  • Single-precision Float: 467397 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 3739.2
  • MDPFLOPS/$: N/A

Submitted by anlashok using GeForce GTS 250 (UNDERCLOCKED- [C/S/M] [675/1458/1800])

  • Operating System: Win XP
  • CPU: i5 750 @ 3.5Ghz
  • Device Driver: 191.07 (6.14)
  • GPU Memory: 512 MB
  • Average cost: $125
  • Single-precision Float: 371421 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 2971.4
  • MDPFLOPS/$: N/A

Submitted by JuzIE using XFX 9800GTX+ (ALL STOCK - [C/S/M] [-/-/-])

  • Operating System: Vista Ultimate 32bit
  • CPU: Q9550 @ 3.4
  • Device Driver: 8.16.0011.9107.
  • GPU Memory: 512 MB
  • Average cost: $100 <== Price Thanks to JuzIE - I thought they were a bit more expensive?
  • Single-precision Float: 467838 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 4678.4
  • MDPFLOPS/$: N/A

Submitted by BlueSteel using 9800GTX+ (ALL STOCK - [C/S/M] [-/-/-])

  • Operating System: Windows 7 64bit
  • CPU: Unknown
  • Device Driver: Unknown
  • GPU Memory: 512 MB
  • Average cost: $100 (Price Thanks to JuzIE - I thought they were a bit more expensive?)
  • Single-precision Float: 467503 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 4675.0
  • MDPFLOPS/$: N/A

Submitted by Nitecore using GeForce 9800GT (OVERCLOCKED Core UNDERCLOCKED Shaders - [C/S/M] [700/1100/1800])

  • Operating System: Windows 7 64bit
  • CPU: Q9400 @ 3.0Ghz
  • Device Driver: Unknown
  • GPU Memory: 512 MB
  • Average cost: $95
  • Single-precision Float: 397387 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 4183.02
  • MDPFLOPS/$: N/A

Submitted by Spork! using GeForce 9800GT (OVERCLOCKED - [C/S/M] [667/1667/900])

  • Operating System: Windows 7 64bit
  • CPU: Q6600 @ 3.33Ghz
  • Device Driver: 196.21
  • GPU Memory: 512 MB
  • Average cost: $95
  • Single-precision Float: 373294 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 3929.41
  • MDPFLOPS/$: N/A

Submitted by Spork! using GeForce 9800GT (UNDERCLOCKED - [C/S/M] [550/1350/900])

  • Operating System: Windows 7 64bit
  • CPU: Q6600 @ 3.33Ghz
  • Device Driver: 196.21
  • GPU Memory: 512 MB
  • Average cost: $95
  • Single-precision Float: 301152 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 3170.02
  • MDPFLOPS/$: N/A

Submitted by EC MEISTER using GeForce 8800GT (OVERCLOCKED - [C/S/M] [650/900/1625])

  • Operating System: Windows 7 64bit
  • CPU: E6420 @ 3.2Ghz
  • Device Driver: Unknown
  • GPU Memory: 512 MB
  • Average cost: $80
  • Single-precision Float: 361148 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 4514.35
  • MDPFLOPS/$: N/A

Submitted by LostBenji using GeForce 8800 GT (ALL STOCK - [C/S/M] [-/-/-])

  • Operating System: Windows 7 x64 Ultimate
  • CPU: Q9550 @3.7
  • Device Driver: Unknown
  • GPU Memory: 1024 MB
  • Average cost: $80 <== Price Thanks to JuzIE
  • Single-precision Float: 337144 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 4214.3
  • MDPFLOPS/$: N/A

First Submission by MrSmoke using 8800GT (ALL STOCK - [C/S/M] [-/-/-])(GPU0 in SLI 750i)

  • Operating System: Windows Vista 64bit
  • CPU: Q9550 @ 3.4Ghz
  • Device Driver: Unknown
  • GPU Memory: 512 MB
  • Average cost: $80
  • Single-precision Float: 264813 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 3310.2
  • MDPFLOPS/$: N/A

First Submission by MrSmoke using 8800GT (ALL STOCK - [C/S/M] [-/-/-])(GPU1 in SLI 750i)

  • Operating System: Windows Vista 64bit
  • CPU: Q9550 @ 3.4Ghz
  • Device Driver: Unknown
  • GPU Memory: 512 MB
  • Average cost: $80
  • Single-precision Float: 385197 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 4815.0
  • MDPFLOPS/$: N/A

Submitted by Louis Cyphre using GeForce 9600GT (ALL STOCK - [C/S/M] [-/-/-])(Passive Cooled)

  • Operating System: Windows 7 64bit
  • CPU: Q6600 @ 3.0Ghz
  • Device Driver: 8.17.11.9562
  • GPU Memory: 512 MB
  • Average cost: $85
  • Single-precision Float: 203853 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 2398.27
  • MDPFLOPS/$: N/A

Submitted by Lei using Gainward 9500gt (ALL STOCK - [C/S/M] [-/-/-])

  • Operating System: Windows .... 32bit
  • CPU: Unknown
  • Device Driver: Unknown
  • GPU Memory: 512 MB
  • Average cost: $80
  • Single-precision Float: 89105.3 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 1113.8
  • MDPFLOPS/$: N/A

Submitted by Me (Benny11) using 8400M GS (ALL STOCK - [C/S/M] [400/400/800]):

  • Operating System: Windows Vista Home Premium 32bit
  • CPU: AMD Turion x2 Mobile @ 2.0Ghz
  • Device Driver: 8.16.11.8766 (Modded inf to replace out of date HP Drivers)
  • GPU Memory: 256 MB
  • Average cost: $65
  • Single-precision Float: 25438.3 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 391.4
  • MDPFLOPS/$: N/A


2.4 Interesting Results:

In the process of gathering results more interesting things have come to attention.

First "Shaders" they are shown to be the key workhorse of this style of parallel computing. Below we have a example of a GTS 250 running at Stock speeds and Reduced speeds as you will see if just raise the Shaders back up to the Stock speed we get almost the same score - the core and memory speed have little effect on the performance out come of this type of benchmark.


Submitted by anlashok using GeForce GTS 250 (ALL STOCK - [C/S/M] [738/1836/2200])

  • Operating System: Windows XP
  • CPU: i5 750 @ 3.5Ghz
  • Device Driver: 191.07
  • GPU Memory: 512 MB
  • Average cost: $125
  • Single-precision Float: 467397 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 3739.18
  • MDPFLOPS/$: N/A


Submitted by anlashok using GeForce GTS 250 (UNDERCLOCKED - [C/S/M] [675/1458/1800])(Green Edition)

  • Operating System: Windows XP
  • CPU: i5 750 @ 3.5Ghz
  • Device Driver: 191.07
  • GPU Memory: 512 MB
  • Average cost: $125
  • Single-precision Float: 371421 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 2971.37
  • MDPFLOPS/$: N/A


Submitted by anlashok using GeForce GTS 250 (UNDERCLOCKED CORE+MEMORY - SHADERS @ STOCK - [C/S/M] [675/1836/1800])(Green Edition)

  • Operating System: Windows XP
  • CPU: i5 750 @ 3.5Ghz
  • Device Driver: 191.07
  • GPU Memory: 512 MB
  • Average cost: $125
  • Single-precision Float: 467247 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 3737.97
  • MDPFLOPS/$: N/A


Second, something that when building a cheap GPU crunching box is you will most likely come across is not all PCI-E slots are created equal. Whilst your main slots generally run at x16 on a multi GPU system on most boards you will find some slots only running at x4. Now for Gaming this is bad news however not so much for number crunching, it will depend of course what your crunching but as you see below the size of the memory bus is the only thing effected. The GPU can still crunch just as well so for example Folding@Home doesn't require the same large maps/textures etc that a game engine would and thus is quite happy to fold on a x16 or x4 slot just the same.


Submitted by anlashok using GeForce GTS 250 (ALL STOCK - [C/S/M] [738/1836/2200]) (Running in a primary x16 slot)

  • Operating System: Windows XP
  • CPU: i5 750 @ 3.5Ghz
  • Device Driver: 191.07
  • GPU Memory: 512 MB
  • Average cost: $125
  • Single-precision Float: 467397 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 3739.18
  • MDPFLOPS/$: N/A
  • Memory Copy
  • Host Pinned to Device: 5613.96 MB/s
  • Host Pageable to Device: 4154.75 MB/s
  • Device to Host Pinned: 5613.22 MB/s
  • Device to Host Pageable: 4189.35 MB/s


Submitted by anlashok using GeForce GTS 250 (ALL STOCK - [C/S/M] [738/1836/2200]) (Running in a secondary x4 slot)

  • Operating System: Windows XP
  • CPU: i5 750 @ 3.5Ghz
  • Device Driver: 191.07
  • GPU Memory: 512 MB
  • Average cost: $125
  • Single-precision Float: 467009 Mflop/s
  • Double-precision Float: Not Supported
  • MSPFLOPS/$: 3736.07
  • MDPFLOPS/$: N/A
  • Memory Copy
  • Host Pinned to Device: 739.282 MB/s
  • Host Pageable to Device: 727.411 MB/s
  • Device to Host Pinned: 739.197 MB/s
  • Device to Host Pageable: 727.722 MB/s


External Links


More Links


[Main Page]
OCAU News
OCAU Forums
PC Database

Main Page
Recent changes
Random page
All pages
Help

View source
Discuss this page
Page history
What links here
Related changes

Special pages