MUMPS

From Free net encyclopedia

(Redirected from M technology)

MUMPS (Massachusetts General Hospital Utility Multi-Programming System), or alternatively M, is a programming language created in the late 1960s for use in the healthcare industry. It was designed to make writing database-driven applications easy while simultaneously making as efficient use of computing resources as possible. Although it never gained widespread popularity, it was adopted as the language-of-choice for many healthcare and financial information systems/databases (especially ones developed in the 1970s and early 1980s) and continues to be used by many of the same clients today.

Because it predates C, BASIC and most other popular languages in current usage, it has very different syntax and terminology. It offers a number of features unavailable in other languages and showcases some rarely used programming and database concepts.

Contents

Overview

MUMPS is a language designed for building database applications. Secondary language features are designed to help programmers make applications that use as few computing resources as possible. Original implementations were interpreted, though modern implementations may be either fully or partially compiled.

The core feature of MUMPS is that database interaction is transparently built into the language. Simply by using variables prefixed with a caret '^' character, you are referencing a database node. Assignment and retrieval uses the same commands as for interacting with standard RAM-based variables. Additionally, all variables (both RAM and database-based) can be treated as a multidimensional hash/array. Child nodes of a variable (called subscripts in M) can have numeric or string keys (the keys themselves are also called subscripts, such that with the variable name ^A("B",2,6), "B", 2 and 6 are the first, second and third subscripts of ^A). String keys are automatically stored in alphabetical order following all numeric keys. Numeric keys can have negative and/or floating-point values, all of which will be stored in order from lowest to highest. The MUMPS terminology for database-linked variables is a global, not to be confused with the C term for unscoped variables (see Variable scoping).

As a secondary language feature, you can abbreviate nearly all commands and native functions down to a single character to save space. Additionally, special operators exist to let you treat a delimited string (like Comma-separated values) as an array. To reduce the number of hard-disk reads, early MUMPS programmers would store a structure of related information as a delimited string, parsing it out after it was read.

MUMPS has no data types. Numbers can be treated as strings of digits, strings can be cast (coerced, in MUMPS terminology) into numbers by numeric operators. When a string is coerced, the parser turns as much of the string (starting from the left) into a number as it can, then discards the rest. Thus the statement 'IF 20<"30 DUCKS"' is evaluated as TRUE in MUMPS.

Other features of the language standard are designed to help applications interact with each other in a multi-user environment. Database locks, process identifiers and Atomicity of database update transactions are all required of MUMPS implementations that follow the standard.

In contrast to languages based on C, whitespace is signifant. A single space separates a command from its argument, and a space or newline separates the argument from the next command. Commands that take no arguments (like ELSE) require two following spaces; one to separate it from its (nonexistent) argument, then another to separate the "argument" from the next command. Newlines are also significant; an IF, ELSE or FOR command processes/skips everything else on the line. To make them affect multiple lines, you must use the DO command to create a block.

"Hello, World!" in MUMPS

A simple Hello world program in MUMPS is:

 hello() 
        write "Hello, World!",!
        quit

and would be run from the MUMPS command line with the command 'do ^hello()'. Since MUMPS allows commands to be strung together on the same line and commands abbreviated to a single letter, this routine could be even more compact:

 hello() w "Hello, World!",! q

The ',!' after the text generates a newline. The 'quit' is not strictly necessary at the end of a function, but is good programming practice in case other functions are added below 'hello()' later.

Or one could generate an HTML file, with each section created by its own function:

  

  hellohtml()  
        ; This redirects all output to a file, here an html file.
        SET dev="www/HelloWorld.htm" OPEN dev         
        USE dev DO html CLOSE dev
        QUIT
  html  W !,"<html>"  DO head,body W "</html>",! Q
  head  ; similar javascript and style subroutines could be added.
        W !,"<head>"  DO javascript,style W "</head>",! Q     
  body  W !,"<body>"  DO H1 W !,"</body>",! Q
  H1    W !,"<H1 font-size='20pt'>"
        W "Hello World from MUMPS via HTML !"
        W "</H1>",! QUIT
  javascript   ...  you get the picture   QUIT
  style    ...   QUIT

History

MUMPS, the Massachusetts General Hospital Utility Multi-Programming System, was developed by Neil Pappalardo in Octo Barnett's animal lab at Massachusetts General Hospital (MGH) in Boston during 1966 and 1967. The original MUMPS system was built on a spare DEC PDP-7.

Octo Barnett and Neil Pappalardo were also involved with MGH's plans for a Hospital Information System, obtained a PDP-9 and began using MUMPS in the admissions cycle and laboratory test reporting. MUMPS was then an interpreted language and incorporated a hierarchical database file system to standardize interaction with the data. The origins of MUMPS can be traced from Rand Corporation JOSS through BBN's TELCOMP and STRINGCOMP. The MUMPS team deliberately chose to write the new language with portability in mind. Another feature not widely supported in operating systems of the era was multitasking, which was also built into MUMPS itself.

MUMPS was soon ported to a PDP-15 where it lived for some time. Developed on a government grant, MUMPS was required to be released in the public domain (no longer a requirement for grants), and was soon ported to a number of other systems including the popular PDP-8, the Data General Nova and the PDP-11. Word of MUMPS spread mostly through the medical community, and by the early 1970s was in widespread use, often being locally modified for their own needs.

By the early 1970s there were many varied implementations of MUMPS on a range of hardware platforms. The most widespread being DECs MUMPS-11 on the PDP-11 and Meditech's MIIS. In 1972 various MUMPS users gathered in order to standardize the now fractured language, creating the MUMPS Users Group and MUMPS Development Committee (MDC). These efforts proved successful; a standard was complete by 1974, and was approved, on September 15, 1977, as ANSI standard, X11.1-1977. At about the same time DEC launched DSM-11 (Digital Standard MUMPS) for the PDP-11. This quickly dominated the market and became the reference implementation of the time.

During the early 1980s a number of vendors sprung up to market MUMPS-based platforms. The two largest were Digital Equipment Corporation with their DSM (Digital Standard MUMPS) product, and InterSystems with their ISM (InterSystems M) product on VMS and UNIX, and M/11+ on the PDP-11 platform. Other companies that developed important MUMPS implementations were: Greystone Technology Corporation with a compiled version called GT.M, DataTree Inc. with an Intel PC based product called DTM, Micronetics Design Corporation with a product line called MSM for UNIX and Intel PC platforms (later ported to IBM's VM operating system), and M-Global with MGM a Mac OS based product. M-Global MUMPS was the first commercial MUMPS for the PC and the only Mac product. DSM-11 was superseded by VAX/DSM for the VAX/VMS platform. This was then ported to the Alpha in two variants as DSM for OpenVMS, and as DSM for Ultrix.

This period also saw a lot of activity by the MDC. The second revision of the ANSI standard for MUMPS was (X11.1-1984) was approved on November 15, 1984. On November 11, 1990 the third revision of the ANSI standard (X11.1-1990) was approved. In 1992 this same standard was also adopted as ISO standard 11756-1992. Around this time the use of M as an alternative name was sanctioned. On December 8, 1995 the fourth revision of the standard (X11.1-1995) was approved by ANSI, and by ISO in 1999 as ISO 11756-1999. The MDC finalized a further revision to the standard in 1998 but this has never been presented to ANSI for approval. On 6 January 2005, ISO re-affirmed its MUMPS-related standards: ISO/IEC 11756:1999, language standard, ISO/IEC 15851:1999, Open MUMPS Interconnect and ISO/IEC 15852:1999, MUMPS Windowing Application Programmers Interface.

By 2000, the middleware vendor InterSystems had become the dominant player in the market with the purchase of several of the other players. Initially they acquired DataTree Inc. in the early 1990's. On December 30, 1995, the acquisition of the DSM product line from DEC was announced. InterSystems then began to consolidate these products into a single product line, badging them on a number of platforms as OpenM. In 1997 InterSystems completed this consolidation by launching a unified successor named [[Cach�]]. This was based on their ISM product but with some influences from the other products. The assets of Micronetics Design Corporation were also eventually acquired by InterSystems on June 21, 1998. Intersystems remains the dominant MUMPS vendor, selling [[Cach�]] to MUMPS developers who write applications for a variety of operating systems.

Greystone Technology Corporation's GT.M product was sold to Sanchez Computer Associates Inc. (now part of Fidelity National Financial Inc.) in the mid 1990s. On November 7, 2000 Sanchez made GT.M for Linux available under the GPL license and on October 28, 2005 GT.M for OpenVMS and Tru64 UNIX were also made available under the GPL license. GT.M continues to be available on other UNIX platforms under a traditional license.

The newest implementation of MUMPS, released in April 2002, is an MSM derivative called M21 from the Real Software Company of Rugby, UK.

There are also several open source implementations of MUMPS.

One of the original creators of the MUMPS language, Neil Pappalardo, went on to found a company called Meditech. He extended and built on the MUMPS language, naming the new language MIIS (and later, MAGIC). Unlike Intersystems, Meditech does not sell middleware, so MIIS and MAGIC are only used internally at Meditech.

Major users of MUMPS applications

Veterans Administration and Department of Defense

The Veterans Administration (today known as the United States Department of Veterans Affairs) officially adopted MUMPS as the programming language to be used to implement an integrated laboratory / pharmacy / patient admission, tracking and discharge system in the early 1980s. The original version, the Decentralized Hospital Computer Program (DHCP) was delivered early and under budget. DHCP has been continuously extended in the years since. Most of the source code is available at no cost under the Freedom of Information Act. However, a major module, IFCAP (Integrated Funds Distribution, Control Point Activity, Accounting and Procurement) is not available to the general public (but is to hospitals) because it contains certain validation routines and accounting information that could be used for fraud. Before implementing DHCP, the VA also wrote an intermediate layer known as FileMan in MUMPS to act as a database management system. VA then hired SAIC to do a 2 pilot projects of converting the MUMPS into a Java/web based solution, but the VA managed its money so poorly that the government removed funding opportunies. One pilot was successful, with 100% transformation converting MUMPS to Java and web-enabling it, including FileMan. This pilot is open source and can be aquired at VA.

Today, DHCP is known as Veterans Health Information Systems and Technology Architecture (VistA). The Hardhats.org website is the center for the international community of VistA developers and users and also serves something of the same function for MUMPS generally.

In the late 1980's the Department of Defense decided to implement a next-generation healthcare information system for the active military. The contract was awarded to SAIC, which developed the Composite Health Care System (CHCS). Rather than starting from scratch, SAIC started with DHCP and built on it. At this time, IBM decided to enter the market. Rather than develop its own MUMPS implementation it licensed the Micronetic's implementation. However, despite a lot of hype in the MUMPS community, IBM remained interested mainly in selling hardware. Tandem followed the same path, using the Micronetics implementation on its machines.

Nearly the entire VA hospital system in the United States and the Indian Health Service, as well as major parts of the Department of Defense CHCS hospital system all still run the system for clinical data tracking.

Other Industries

MUMPS also gained an early following in the financial sector, and MUMPS applications are still in use at many banks and credit unions.

As of 2005 most use of M is either in the form of [[Cach�]] or GT.M. Cach�, in particular is being strongly marketed by InterSystems and is having some success in penetrating new markets, such as telecomms.

MUMPS Language Syntax

The M syntax allows multiple commands to appear on a line, grouped into procedures (subroutines) in a fashion similar to most structured programming systems.

In MUMPS syntax, spaces are significant. There are contexts in which a pair of spaces is interpreted differently from a single space, however, extra spaces may always be added between commands for clarity. Lines are syntactically significant, and carriage returns and linefeeds are not treated as white space. There is no requirement to put semicolons at the end of commands, and lines may be continued when needed.

Procedures - MUMPS Routines

A typical M procedure (a "routine" in MUMPS terminology) is analogous to a file and consists of lines of MUMPS code in typical sequence. Labels can be applied for creating internal subroutines refenced within the routine scope by the simple label and from outside the routine scope using the Label and routine name separated by the 'up-arrow' character (actually the caret, as in ^ABC ). Calling the procedure/routine at the front uses the routine name which starts with the caret (e.g. ^ABC as DO ^ABC).

Within the routine ^ABC, labels are defined by starting a line with a label instead of a space or tab. One may reference the label within the routine as DO SUBX or outside as DO SUBX^ABC and it may or may not have a variable number of arguments and may return a value as a function.

Variables and DataTypes

One main difference between M and most other languages is that M does not require the programmer to declare variables by datatype (or to declare them at all !) They are in effect strings. Numbers may be represented a strings. Use of a variable in a numeric context (addition, subtraction) invokes a well-defined conversion in the case the string is not a canonic number, such as "123 Main Street". If the programmer is adding addresses there is a bigger problem than datatype!

M includes a complete and powerful set of string manipulation commands inherent in the language.

M has powerful sparse arrays for "local variables" which are process-connected and disappear with the termination of the process and database variables (called "global variables" in MUMPS terminology) which are heirarchic in concept and implementation.

They are sparse such that there is no requirement for contiguous nodes to exist- A(1), A(99) and A(100) may be used without having to define or allocate space fo2 2 thru 98. Indeed, one can use fractional numbers ( A(1.2), A(3.3) etc ) where the numbers have some meaning to the program. The powerful access function $ORDER( A(1.2) ) return the next defined key or subscript value, 3.3 in this example, so the program can effectively manage the data. This provides an automatic sort feature, inherent in the language.

This is used in global index functions where the sort key is used as a global subscript ( ^INDEX(lastname,firstname,SSSNumber)=... )

SET A="abc"

creates the variable A and sets its value to the string. An array with the same name is distinct in the namespace-

SET A(1,2)="def"

Subscripts may be string valued as well as integer or numeric-valued ( A(1.2) )

SET A("first_name")="Bob"
SET A("last_name")="Dobbs"

making the variables useful data stores on their own.

M local variables work in a similar fashion as with other programming languages, in that when the program exits, the value will be lost.

Global Variables - the Database

M comes into its own with its concept of globals, variables which are intrinsically stored in files and persist beyond the program or process completion. Globals appear as normal variables with the caret character in front of the name. Modifying the earlier example thus:

SET ^A("first_name")="Bob"
SET ^A("last_name")="Dobbs"

will result in a new record being created and inserted in the file structure, persistent just as a file persists in an operating system. Globals are stored, naturally, in highly structured data files by the language and accessed only as MUMPS globals. The strength and efficiency of MUMPS is based on a long history of efficient, stable, theoretically-sound cached/journaled and balanced B-tree key/value storage including sophisticated transaction control for multiple file transaction commit and roll-back at the language/operating system level. Huge databases grow randomly rather than in a a forced serial order and the M system handles all flawlessly and invisibly to the programmer.

For all of these reasons one of the most common M programs is a database management system, providing all of the classic ACID properties on top of a generic M implementation. FileMan is one such example. Intersystems Cache allows two views of selected data structures - as MUMPS globals and as SQL datafiles and has SQL built in to their implementation (M/SQL). MUMPS allows the programmer much wider control of the data - no requirement to fit your world into square SQL boxes of rows and columns.

Multi-User, Multi-Tasking, Multi-Processor

MUMPS was multi-user when memory was measured in kilobyte of "core" (magnetic rings for those of you who slept thru the history of computers lecture) and processor time was scarce, but processors were even more so. Based in sound theory from the outset, M implementations include complete support for multi-tasking, multi-user, multi-machine programming in the context of whatever OS it functions within. To demonstrate the ease of multi-machine support, consider:

SET ^|"DENVER"|A("first_name")="Bob"
SET ^|"DENVER"|A("last_name")="Dobbs"

which sets up A as before, but this time on the remote machine called "DENVER". M programs are thus trivial to distribute over many machines. This support also made it easy to expose the same sorts of distribution in the SQL (and other) layers with ease, and it's not uncommon for M systems to be a better distributed SQL solution than a "real" SQL system.

Another use of M in more recent times has been to create object databases.

M can easily generate html or xml pages as well, and can be called via the CGI interface to serve web pages directly from the database. You could also use it as the backend for web applications using AJAX background communication.

M can also read delimited files easily, such as .csv (comma-separated values) files exported from spreadsheets.

Summary of key language features

This incomplete, informal sketch seeks to give programmers familiar with other languages a feeling for what M is like. Neither the language description and the descriptions of each feature are complete, and many significant features have been omitted for brevity. These notes reflect the language circa 1994. ANSI X11.1�1995 gives a complete, formal description of the language; an annotated version of this standard is available online.

Data types: one universal datatype, interpreted/converted to string, integer, or floating-point number as context requires. Like Visual BASIC "variant" type.

Booleans: In IF statements and other conditionals, any nonzero value is treated as True. a<b yields 1 if a is less than b, 0 otherwise.

Declarations: NONE. Everything dynamically created on first reference.

Lines: important syntactic entities. Multiple statements per line are idiomatic. Scope of IF and FOR is "remainder of current line."

Case sensitivity: Commands and intrinsic functions are case-insensitive. Variable names and labels are case-sensitive. No specified meaning for upper vs. lower-case and no widely accepted conventions. Percent sign (%) is legal as first character of variables and labels.

Postconditionals: SET:N<10 A="FOO" sets A to "FOO" if N is less than 10; DO:N>100 PRINTERR performs PRINTERR if N is greater than 100. Provides a conditional whose scope is less than the full line.

Arrays: created dynamically, stored sparsely as B-trees, any number of subscripts, subscripts can be strings or integers. Always automatically stored in sorted order. $ORDER and $QUERY functions allow traversal.

       for i=10000:1:12345 set sqtable(i)=i*i
       set address("Smith","Daniel")="dpbsmith@world.std.com"

Local arrays: names not beginning with caret; stored in process space; private to your process; expire when process terminates; available storage depends on partition size but is typically small (32K)

Global arrays: ^abc, ^def. Stored on disk, available to all processes, persist when process terminates. Very large globals (hundreds of megabytes) are practical and efficient. This is M's main "database" mechanism. Used instead of files for internal, machine-readable recordkeeping.

Indirection: in many contexts, @VBL can be used and effectively substitutes the contents of VBL into the statement. SET XYZ="ABC" SET @XYZ=123 sets the variable ABC to 123. SET SUBROU="REPORT" DO @SUBROU performs the subroutine named REPORT. Operational equivalent of "pointers" in other languages.

Piece function: Treats variables as broken into pieces by a separator. $PIECE(STRINGVAR,"^",3) means the "third caret-separated piece of STRINGVAR." Can appear as an assignment target. After

       SET X="dpbsmith@world.std.com" 

$PIECE("world.std.com",".",2) yields "std" SET $P(X,"@",1)="office" causes X to become "office@world.std.com".

Order function

       Set stuff(6)="xyz",stuff(10)=26,stuff(15)=""  

$Order(stuff("")) yields 6, $Order(stuff(6)) yields 10, $Order(stuff(8)) yields 10, $Order(stuff(10)) yields 15, $Order(stuff(15)) yields "".

       Set i="" For  Set i=$O(stuff(i)) Quit:i=""  Write !,i,?10,stuff(i)

The argumentless For iterates until stopped by the Quit. Prints a table of i and stuff(i) where i is successively 6, 10, and 15.

For a thorough listing of the rest of the MUMPS commands, operators, functions and special variables see the online resource

  • MUMPS by Example, or the (out of print) book of the same name by Ed de Moel. Much of the language syntax is detailed there, with examples of usage. There is also an annotated language standard, showing evolution of the language and differences between updates to the ANSI standard. It is available at
  • Annotated MUMPS Language Standard


"MUMPS" vs. "M"

While of little interest to outsiders, this topic was and is contentious within the MUMPS/M community.

All of the following opinions can and have been supported by knowledgeable people at various times:

  • The name became M in 1993 when the M Technology Association adopted it.
  • The name became M on December 8th, 1995 with the approval of ANSI X11.1-1995
  • Both M and MUMPS are officially accepted names.
  • M is only an "alternate name" or "nickname" for the language, and MUMPS is still the official name.

Some of the contention arose in response to strong advocacy for the name M on the part of one particular commercial interest, InterSystems, whose CEO disliked the name MUMPS and felt that it represented a serious marketing obstacle for InterSystems. Thus advocacy for the name M to some extent became identified as alignment with InterSystems. The dispute also reflected rivalry between organizations (the M Technology Association, the MUMPS development committee, the ANSI and ISO standards committee) as to who determines the "official" name of the language. Some writers have attempted to defuse the issue by referring to the language as M[UMPS], square brackets being the customary notation for optional syntax elements.

The most recent standard (ISO/IEC 11756:1999, re-affirmed on 6 January 2005) still mentions both M and MUMPS as officially accepted names.

The December 31, 1840 epoch

In M, the current date and time is contained in a special system variable, $H (for "HOROLOG"). The format is a pair of integers separated by a comma, e.g. "54321,12345" The first number is the number of days since December 31st, 1840, i.e. day number 1 is January 1st, 1841; the second is the number of seconds since midnight.

The reason for this choice of epoch is a bit of MUMPS trivia. James M. Poitras has written that he chose this epoch for the date and time routines in a package developed by his group at MGH in 1969: "I remembered reading of the oldest (one of the oldest?) U.S. citizen, a Civil War veteran, who was 121 years old at the time. Since I wanted to be able to represent dates in a Julian-type form so that age could be easily calculated and to be able to represent any birth date in the numeric range selected, I decided that a starting date in the early 1840s would be 'safe.' Since my algorithm worked most logically when every fourth year was a leap year, the first year was taken as 1841. The zero point was then December 31, 1840.... I wasn't party to the MDC negotiations, but I did explain the logic of my choice to members of the Committee."

(More colorful versions have circulated in the folklore, suggesting, for example, that December 31st, 1840 was the exact date of the first entry in the MGH records, but these appear to be legends).

For those who care about such things, $HOROLOG hit 60000 on April 10th, 2005; will be 70000 on August 26th, 2032; 80000 on January 12th, 2060; 90000 on May 30th, 2087; and 100000 on October 16th, 2114.

Opinion

Debates on the merits and drawbacks of the MUMPS language are virtually nil for pragmatic reasons. Most existing applications using MUMPS have been around since the 1970s and consist of large code collections that would be infeasible (or at least, very costly) to rewrite in another language. Software houses selling MUMPS-based applications rarely give the end-user the chance to interact with the language, so the applications are sold on their own merits, not the language's.

MUMPS' major competitor in the database-specialized language arena is SQL. SQL cannot usually be used on its own though, as it is not a complete language. Neither does SQL specify how the database is to be structured. When a MUMPS implementation is compared to other languages, it is usually a combination of several languages and a database vendor, for example SQL + C + Oracle. Some MUMPS vendors even support the SQL + MUMPS combination. MUMPS offers both more and less native functionality in different areas than SQL and C, and may both outperform or underperform against Oracle depending on what you want to do. Comparisons are always difficult, perhaps explaining why there has never been a strong incentive to rewrite MUMPS applications in other languages. Brand-new database-driven applications are likely to be written in SQL and C, PHP or another popular language simply because there is a much wider talent pool of people with those skills.


Pro

MUMPS vendors have called MUMPS the 'Best-kept secret in IT', and Richard G. Davis (in Walters, 1989) commented that "Where economics has been a primary consideration... the MUMPS language has distinguished itself."

MUMPS advocates believe it to be undervalued-in part due to its venerable age, its facetious name, and its "total indifference to academic correctness".

MUMPS has a number of features to recommend it; it can run with minuscule system requirements, non-programmers can easily learn its simple yet rigid syntax, new programmers can see results very quickly. The language offers many native features that are available in other languages only though libraries. MUMPS' advantages over other languages available in the 1970s are even clearer; it typically used far less memory and CPU resources than Lisp, and makes it much easier for the programmer to interact with the database than Fortran.

MUMPS advocates often claim significant speed advantages over non-MUMPS competitors. A benchmark in the early 1980s sponsored by DEC found that DEC's MUMPS implementation running on DEC hardware was 3-6 times faster than Oracle implementations running on IBM and HP hardware. There do not seem to be any comparison studies with results publically available after 1990.


Con

MUMPS's lack of popularity and its differences from modern languages in widespread use are perhaps its biggest drawbacks. String length/database node length limitations and lack of DBMS or object-oriented features are often criticisms cited by advocates of other database solutions.

Non-standard, vendor-specific workarounds are offered for most of these problems, but using them can make your code non-portable to other MUMPS implementations.

References

  • Walters, Richard (1989). "ABCs of MUMPS. 1989: Butterworth-Heinemann, ISBN 1555580173.
  • Walters, Richard (1997). M Programming: A Comprehensive Guide. Digital Press. ISBN 1555581676.
  • Lewkowicz, John. The Complete MUMPS : An Introduction and Reference Manual for the MUMPS Programming Language. ISBN 0131621254
  • Kirsten, Wolfgang, et al. (2003) Object-Oriented Application Development Using the Cach� Postrelational Database ISBN 3540009604

External links

Template:Major programming languages smallde:Mumps (Programmiersystem) ja:MUMPS nl:MUMPS ru:MUMPS