These papers address the latest research and development efforts and highlight the human aspects of design and use of computing systems. The papers thoroughly cover the entire field of Human-Computer Interaction, addressing major advances in knowledge and effective use of computers in a variety of application areas.
Statistics students and analysts alike might be overwhelmed when it comes to repeated measures or longitudinal data analyses. They might try to educate themselves by diving into text books or taking semester-long or intensive weekend courses, resulting in even more confusion. Some might try to ignore the repeated nature of data and take short cuts such as analyzing all data as independent observations or analyzing summary statistics such as averages or changes from first to last points and ignoring all the data in-between.
This hands-on presentation introduces longitudinal and repeated measures analyses without heavy emphasis on theory. Students in the workshop will have the opportunity to get hands-on experience graphing longitudinal and repeated measures data. Emphasis will be on continuous outcomes, but categorical outcomes will briefly be covered.
Read the paper PDF. Government funding is a highly sought-after financing medium by entrepreneurs and researchers aspiring to make a breakthrough in the market and compete with larger organizations. The funding is channeled via federal agencies that seek out promising research and products that improve the extant work. This study analyzes the project abstracts that earned the government funding through a text analytic approach over the past three and a half decades in the fields of Defense and Health and Human Services.
This helps us understand the factors that might be responsible for awards made to a project. Initially, we analyze the trends in government funding over research by the small businesses. Then we perform text mining on the 55, abstracts of the projects that are funded by the Department of Defense and the Department of Health.
From the text mining, we portray the key topics and patterns related to the past research and determine the changes in their trends over time. Through our study, we highlight the research trends to enable organizations to shape their business strategy and make better investments in the future research.
Although business intelligence experts agree that empowering businesses through a well-constructed semantic layer has undisputed benefits, a successful implementation has always been a formidable challenge. A correctly implemented semantic layer provides business users with quick and easy access to information for analytical and fact-based decision-making.
Today, everyone talks about how the modern data platform enables businesses to store and analyze big data, but we still see most businesses trying to generate value from the data that they already store. From self-service to data visualization, business intelligence and descriptive analytics are still the key requirements for any business, and we discuss how to use SAS Visual Analytics to address them all. We also describe the key considerations in strategy, people, process, and data for a successful semantic layer rollout that uses SAS Visual Analytics.
Among the more well-known uses of this data are analyses published by the Wall Street Journal showing that a large, and in some cases, shocking discrepancy between what hospitals potentially charge the uninsured and what they are paid by Medicare for the same procedure.
Analyses such as these highlight both potential inequities in the US health care system and, more importantly, potential opportunities for its reform. However, while capturing the public imagination, analyses such as these are but one means to capitalize on the remarkable wealth of information this data provides. Specifically, data from the public distribution CMS data can help both researchers and the public better understand the burden specific conditions and medical treatments place on the US health care system.
It was this simple, but important objective that motivated the present study. Our specific analyses focus on two of what we believe to be important questions. First, using the total number of hospital discharges as a proxy for incidence of a condition or treatment, which have the highest incidence rates nationally? And, is there variability in these incidence rates across states?
Second, as psychologists, we are necessarily interested in understanding the state of mental health care. To date, and to the best of our knowledge, there has been no study utilizing the public inpatient Medicare provider utilization and payment data set to explore the utilization of mental illness services funded by Medicare. In the regulatory world of patient safety and pharmacovigilance, whether it's during clinical trials or post-market surveillance, SAEs that affect participants must be collected, and if certain criteria are met, reported to the FDA and other regulatory authorities.
SAEs are often entered into multiple databases by various users, resulting in possible data discrepancies and quality loss. Efforts have been made to reconcile the SAE data between databases, but there is no industrial standard regarding the methodology or tool employed for this task. Some organizations still reconcile the data manually, with visual inspections and vocal verification.
Not only is this laborious and error-prone, it becomes prohibitive when the data reach hundreds of records. Our algorithm identifies matched, discrepant, and unpaired SAE records. Additionally, it employs a user-supplied list of synonyms to find non-identical but relevant matches. Record counts and Levenshtein edit distances are calculated within certain groups to assist with sorting and matching.
This combined record list is then fed into a DATA step to decide whether a record is paired or unpaired. For an unpaired record, a stub record with all fields set as? Each record is written to one of two data sets. Later, the data sets are tagged and pulled into a comparison logic using hash objects, which enable field-by-field comparison and display discrepancies in clean format for each field. Identical fields or columns are cleared or removed for clarity. The result is a streamlined and user-friendly process that allows for fast and easy SAE reconciliation.
A simple preprocessing technique creates many small image patches from larger images. These patches encourage the learned patterns to have local scale, which follows well-known statistical properties of natural images. In addition, these patches reduce the number of features that are required to represent an image and can decrease the training time that algorithms need in order to learn from the images.
If a training label is available, a classifier is trained to identify patches of interest. In the unsupervised case, a stacked autoencoder network is used to generate a dictionary of representative patches, which can be used to locate areas of interest in new images.
This technique can be applied to pattern recognition problems in general, and this paper presents examples from the oil and gas industry and from a solar power forecasting application. When dealing with non-normal categorical response variables, logistic regression is the robust method to use for modeling the relationship between categorical outcomes and different predictors without assuming a linear relationship between them.
Within such models, the categorical outcome might be binary, multinomial, or ordinal, and predictors might be continuous or categorical. Another complexity that might be added to such studies is when data is longitudinal, such as when outcomes are collected at multiple follow-up times. Learning about modeling such data within any statistical method is beneficial because it enables researchers to look at changes over time.
This study looks at several methods of modeling binary and categorical response variables within regression models by using real-world data. To assess binary outcomes, the current study models binary data in the absence and presence of correlated observations under regular logistic regression and mixed logistic regression.
To assess multinomial outcomes, the current study uses multinomial logistic regression. When responses are ordered, using ordinal logistic regression is required as it allows for interpretations based on inherent rankings. Different logit functions for this model include the cumulative logit, adjacent-category logit, and continuation ratio logit. Each of these models is also considered for longitudinal panel data using methods such as mixed models and Generalized Estimating Equations GEE.
The final consideration, which cannot be addressed by GEE, is the conditional logit to examine bias due to omitted explanatory variables at the cluster level. Such a large data set gives endless research opportunities for researchers and health-care professionals. However, patient care data is complex and might be difficult to manage. Breast cancer is the second leading cause of cancer deaths among women in the United States.
Although mortality rates have been decreasing over the past decade, it is important to continue to make advances in diagnostic procedures as early detection vastly improves chances for survival. The goal of this study is to accurately predict the presence of a malignant tumor using data from fine needle aspiration FNA with visual interpretation.
Compared with other methods of diagnosis, FNA displays the highest likelihood for improvement in sensitivity. Furthermore, this study aims to identify the variables most closely associated with accurate outcome prediction. The data set contains clinical case samples The study analyzes a variety of traditional and modern models, including: logistic regression, decision tree, neural network, support vector machine, gradient boosting, and random forest.
Prior to model building, the weights of evidence WOE approach was used to account for the high dimensionality of the categorical variables after which variable selection methods were employed. Ultimately, the gradient boosting model utilizing a principal component variable reduction method was selected as the best prediction model with a 2.
Additionally, the uniformity of cell shape and size, bare nuclei, and bland chromatin were consistently identified as the most important FNA characteristics across variable selection methods. These results suggest that future research should attempt to refine the techniques used to determine these specific model inputs.
Greater accuracy in characterizing the FNA attributes will allow researchers to develop more promising models for early detection. Using smart clothing with wearable medical sensors integrated to keep track of human health is now attracting many researchers. To overcome this problem, recognizing human activities, determining relationship between activities and physiological signals, and removing noise from the collected signals are essential steps.
This paper focuses on the first step, which is human activity recognition. For this study, two data sets were collected from an open repository. Both data sets have input variables and one nominal target variable with four levels. Principal component analysis along with other variable reduction and selection techniques were applied to reduce dimensionality in the input space.
Several modeling techniques with different optimization parameters were used to classify human activity. The gradient boosting model was selected as the best model based on a test misclassification rate of 0. That is, Users new to SAS or to the health-care field may find an overview of existing as well as new applications helpful.
Risk-adjustment software, including the publicly available Health and Human Services HHS risk software that uses SAS and was released as part of the ACA implementation, is one example of code that is significantly improved by the use of arrays. Similar projects might include evaluations of diagnostic persistence, comparisons of diagnostic frequency or intensity between providers, and checks for unusual clusters of diagnosed conditions.
This session reviews examples suitable for intermediate SAS users, including the application of two-dimensional arrays to diagnosis fields. Bayesian inference for complex hierarchical models with smoothing splines is typically intractable, requiring approximate inference methods for use in practice.
However, for large or complex models, MCMC can be computationally intensive, or even infeasible. It provides an approximating distribution that has minimum Kullback-Leibler distance to the posterior. To improve speed and memory efficiency, we use block decomposition to streamline the estimation of the large sparse covariance matrix.
We also provide practical demonstrations of how to estimate additional posterior quantities of interest from MFVB either directly or via Monte Carlo simulation. The surge of data and data sources in marketing has created an analytical bottleneck in most organizations. Analytics departments have been pushed into a difficult decision: either purchase black-box analytical tools to generate efficiencies or hire more analysts, modelers, and data scientists.
Knowledge gaps stemming from restrictions in black-box tools or from backlogs in the work of analytical teams have resulted in lost business opportunities. Existing big data analytics tools respond well when dealing with large record counts and small variable counts, but they fall short in bringing efficiencies when dealing with wide data.
This paper discusses the importance of an agile modeling engine designed to deliver productivity, irrespective of the size of the data or the complexity of the modeling approach. Through it, users can access data and metadata for over 1, indicators from approximately federal and nonfederal sources.
An API serves as a communication interface for integration. This paper provides detailed information about how to access HIW data with SAS Visual Analytics in order to produce easily understood visualizations with minimal effort through a methodology that automates HIW data processing. Use cases involving dashboards are also examined in order to demonstrate the value of streaming data directly from the HIW. This can be very helpful to organizations that want to lower maintenance costs associated with data management while gaining insights into health data with visualizations.
This paper provides a starting point for any organization interested in deriving full value from SAS Visual Analytics while augmenting their work with HIW data. In the biopharmaceutical industry, biostatistics plays an important and essential role in the research and development of drugs, diagnostics, and medical devices.
This paper provides a broad overview of the different types of jobs and career paths available, discusses the education and skill sets needed for each, and presents some ideas for overcoming entry barriers to careers in biostatistics and clinical SAS programming. Graphs are essential for many clinical and health care domains, including analysis of clinical trials safety data and analysis of the efficacy of the treatment, such as change in tumor size.
One of the major diseases that records a high number of readmissions is bacterial pneumonia in Medicare patients. This study aims at comparing and contrasting Northeast and South regions of the United States based on the factors causing the day readmissions. The study also identifies some of the ICD-9 medical procedures associated with those readmissions.
Further, the study suggests some preventive measures to reduce readmissions. The day readmissions are computed based on admission and discharge dates from until Using clustering, various hospitals, along with discharge disposition levels where a patient is sent after discharge , are grouped. In both regions, the patients who are discharged to home have shown significantly lower chances of readmission.
Also some of the hospital groups have higher readmission cases. By research it was found that during these procedures, patients are highly susceptible to acquiring Methicillin-resistant Staphylococcus aureus MRSA bacteria, which causes Methicillin-susceptible pneumonia. Providing timely follow up for the patients operated with these procedures might possibly limit readmissions.
These patients might also be discharge d to home under medical supervision, as such patients had shown significantly lower chances of readmission. Suppose that you have a very large data set with some specific values in one of the columns of the data set, and you want to classify the entire data set into different comma-separated-values format CSV sheets based on the values present in that specific column. If you divide that data set into csv sheets, it is more frustrating to use the conventional, manual process of converting each of the separated data sets into csv files.
In these two processes, the whole tedious process is done automatically using the SAS code. Competing-risks analyses are methods for analyzing the time to a terminal event such as death or failure and its cause or type. The cumulative incidence function CIF j, t is the probability of death by time t from cause j. New options in the LIFETEST procedure provide for nonparametric estimation of the CIF from event times and their associated causes, allowing for right-censoring when the event and its cause are not observed.
Cause-specific hazard functions that are derived from the CIFs are the analogs of the hazard function when only a single cause is present. Death by one cause precludes occurrence of death by any other cause, because an individual can die only once. Incorporating explanatory variables in hazard functions provides an approach to assessing their impact on overall survival and on the CIF. The Fine-Gray model defines a subdistribution hazard function that has an expanded risk set, which consists of individuals at risk of the event by any cause at time t, together with those who experienced the event before t from any cause other than the cause of interest j.
Finally, with additional assumptions a full parametric analysis is also feasible. We illustrate the application of these methods with empirical data sets. This poster presents a variety of data visualizations the analyst will encounter when describing a health-care population. Among the topics we cover are SAS Visual Analytics Designer object options including geo bubble map, geo region map, crosstab, and treemap , tips for preparing your data for use in SAS Visual Analytics, and tips on filtering data after it's been loaded into SAS Visual Analytics, and more.
This technology increases high availability, allows parallel processing, facilitates increasing demand by scale out, and offers other features that make life better for those managing and using these environments. However, even when business users take advantage of these features, they are more concerned about the business part of the problem. Most of the time business groups hold the budgets and are key stakeholders for any SAS Grid Manager project.
Therefore, it is crucial to demonstrate to business users how they will benefit from the new technologies, how the features will improve their daily operations, help them be more efficient and productive, and help them achieve better results. This paper guides you through a process to create a strong and persuasive business plan that translates the technology features from SAS Grid Manager to business benefits. Introduction: Cycling is on the rise in many urban areas across the United States.
With the broad-ranging personal and public health benefits of cycling, it is important to understand factors that are associated with these traffic-related deaths. There are more distracted drivers on the road than ever before, but the question remains of the extent that these drivers are affecting cycling fatality rates.
We use a novel machine learning approach, adaptive LASSO, to determine the relevant features and estimate their effect. Results: If a cyclist makes an improper action at or just before the time of the crash, the likelihood of the driver of the vehicle being distracted decreases. At the same time, if the driver is speeding or has failed to obey a traffic sign and fatally hits a cyclist, the likelihood of them also being distracted increases.
Being distracted is related to other risky driving practices when cyclists are fatally injured. Environmental factors such as weather and road condition did not impact the likelihood that a driver was distracted when a cyclist fatality occurred. During the course of a clinical trial study, large numbers of new and modified data records are received on an ongoing basis.
Providing end users with an approach to continuously review and monitor study data, while enabling them to focus reviews on new or modified incremental data records, allows for greater efficiency in identifying potential data issues. In addition, supplying data reviewers with a familiar machine-readable output format for example, Microsoft Excel allows for greater flexibility in filtering, highlighting, and retention of data reviewers' comments.
Upon each execution, the listings are compared against previously reviewed data to flag new and modified records, as well as carry forward any data reviewers' comments made during the previous review. Overall, this approach provides a significantly improved end-user experience above and beyond the more traditional approach of performing cumulative or incremental data reviews using PDF listings.
We might decide to manually run the program every time we get a request, or we might easily schedule an automatic task to send a report at a specific date and time. Both scenarios have some disadvantages. If the report is manual, we have to find and run the program every time someone request an updated version of the output.
It takes some time and it is not the most interesting part of the job. If we schedule an automatic task in Windows, we still sometimes get an email from the customers because they need the report immediately. That means that we have to find and run the program for them. We had developed many reports for different customer groups, and we were getting more and more emails from them asking for updated versions of their reports.
We felt we were not using our time wisely and decided to create an infrastructure where users could easily run their programs through a web interface. The tool that we created enables SAS programmers to easily release on-demand web reports with minimum programming. It has web interfaces developed using stored processes for the administrative tasks, and it also automatically customizes the front end based on the user who connects to the website.
One of the challenges of the project was that certain reports had to be available to a specific group of users only. This paper presents an application based on predictive analytics and feature-extraction techniques to develop the alternative method for diagnosis of obstructive sleep apnea OSA.
Our method reduces the time and cost associated with the gold standard or polysomnography PSG , which is operated manually, by automatically determining the OSA's severity of a patient via classification models using the time series from a one-lead electrocardiogram ECG. The data is from Dr. Thomas Penzel of Philipps-University, Germany, and can be downloaded at www. The selected data consists of 10 recordings 7 OSAs, and 3 controls of ECG collected overnight, and non-overlapping-minute-by-minute OSA episode annotations apnea and non-apnea states.
This accounts for a total of 4, events 2, non-apnea and 2, apnea minutes. Moreover, because different OSA symptoms occur at different times, we account for this by taking features from adjacency minutes into analysis, and select only important ones using a decision tree model. The best classification result in the validation data obtained from the Random Forest model is The results suggest our method is well comparable to the gold standard.
Disparities in the Receipt of Cardiac Revascularization Procedures. While cardiac revascularization procedures like cardiac catheterization, percutaneous transluminal angioplasty, and cardiac artery bypass surgery have become standard practices in restorative cardiology, the practice is not evenly prescribed or subscribed to. We analyzed Florida hospital discharge records for the period to to determine the odds of receipt of any of these procedures by Hispanics and non-Hispanic Whites.
Covariates potential confounders were age, insurance type, gender, and year of discharge. Additional covariates considered included comorbidities such as hypertension, diabetes, obesity, and depression. The results indicated that even after adjusting for covariates, Hispanics in Florida during the time period to were consistently less likely to receive these procedures than their White counterparts.
Reasons for this phenomenon are discussed. Ensemble Modeling: Recent Advances and Applications. Ensemble models are a popular class of methods for combining the posterior probabilities of two or more predictive models in order to create a potentially more accurate model. This paper summarizes the theoretical background of recent ensemble techniques and presents examples of real-world applications.
Examples of these novel ensemble techniques include weighted combinations such as stacking or blending of predicted probabilities in addition to averaging or voting approaches that combine the posterior probabilities by adding one model at a time.
As Data Management professionals, you have to comply with new regulations and controls. To respond to these new demands, you have to put processes and methods in place to automate metadata collection and analysis, and to provide rigorous documentation around your data flows. You also have to deal with many aspects of data management including data access, data manipulation ETL and other , data quality, data usage, and data consumption, often from a variety of toolsets that are not necessarily from a single vendor.
It highlights best practices such as implementing a business glossary and establishing controls for monitoring data. Attend this session to become familiar with the SAS tools used to meet the new requirements and to implement a more managed environment. PROC IRT enables you to perform item parameter calibration and latent trait estimation using a wide spectrum of educational and psychological research.
This analysis offers a great choice to the growing population of IRT users. A shift to SAS can be beneficial based on several features of SAS: its flexibility in data management, its power in data analysis, its convenient output delivery, and its increasing richness in graphical presentation.
It is critical to ensure the quality of item parameter calibration and trait estimation before you can continue with other components, such as test scoring, test form constructions, IRT equatings, and so on. The use of logistic models for independent binary data has relied first on asymptotic theory and later on exact distributions for small samples, as discussed by Troxler, Lalonde, and Wilson While the use of logistic models for dependent analysis based on exact analyses is not common, it is usually presented in the case of one-stage clustering.
The accuracy of the method and the results are compared to results obtained using an R program. This paper explores key SAS technologies that run inside the Hadoop parallel processing framework and prepares you to get started with them. Are you even aware of it? Maybe your boss needs to see the values of some rows in boldface and others highlighted in a stylish yellow. Perhaps one of the columns in the report needs to display a variety of fashionable formats some with varying decimal places and some without any decimals.
Maybe the customer needs to see a footnote in specific cells of the report. The paper shows quick tips and simple code to handle multiple formats within the same column, make the values in the Total rows boldface, trafficlighting, and how to add footnotes to any cell based on the column or row. Hierarchical nonlinear mixed models are complex models that occur naturally in many fields.
The NLMIXED procedure's ability to fit linear or nonlinear models with standard or general distributions enables you to fit a wide range of such models. This paper uses an example to illustrate the new functionality. We introduce age-period-cohort APC models, which analyze data in which performance is measured by age of an account, account open date, and performance date.
We demonstrate this flexible technique with an example from a recent study that seeks to explain the root causes of the US mortgage crisis. In addition, we show how APC models can predict website usage, retail store sales, salesperson performance, and employee attrition.
We even present an example in which APC was applied to a database of tree rings to reveal climate variation in the southwestern United States. There are many methods to randomize participants in randomized control trials. If it is important to have approximately balanced groups throughout the course of the trial, simple randomization is not a suitable method.
Perhaps the most common alternative method that provides balance is the blocked randomization method. A less well-known method called the treatment adaptive randomized design also achieves balance. This paper shows you how to generate an entire randomization sequence to randomize participants in a two-group clinical trial using the adaptive biased coin randomization design ABCD , prior to recruiting any patients.
Such a sequence could be used in a central randomization server. A unique feature of this method allows the user to determine the extent to which imbalance is permitted to occur throughout the sequence while retaining the probabilistic nature that is essential to randomization. Properties of sequences generated by the ABCD approach are compared to those generated by simple randomization, a variant of simple randomization that ensures balance at the end of the sequence, and by blocked randomization.
If your organization already deploys one or more software solutions via Amazon Web Services AWS , you know the value of the public cloud. AWS provides a scalable public cloud with a global footprint, allowing users access to enterprise software solutions anywhere at any time.
In this paper, we describe how we extended our enterprise hosting business to AWS. We describe the open source automation framework from which SAS Soultions onDemand built our automation stack, which simplified the process of migrating a SAS implementation. We'll provide the technical details of our automation and network footprint, a discussion of the technologies we chose along the way, and a list of lessons learned. Graphing Made Easy for Project Management. Project management is a hot topic across many industries, and there are multiple commercial software applications for managing projects available.
The reality, however, is that the majority of project management software is not applicable for daily usage. SAS clients, in real time, can use GTL to visualize resource assignments, task plans, delivery tracking, and project status across multiple project levels for more efficient project management. Since Atul Gawande popularized the term in describing the work of Dr. Jeffrey Brenner in a New Yorker article, hot-spotting has been used in health care to describe the process of identifying super-utilizers of health care services, then defining intervention programs to coordinate and improve their care.
Analyzing administrative health care claims data, which contains information about diagnoses, treatments, costs, charges, and patient sociodemographic data, can be a useful way to identify super-users, as well as those who may be receiving inappropriate care. Both groups can be targeted for care management interventions. In this paper, techniques for patient outlier identification and prioritization are discussed using examples from private commercial and public health insurance claims data.
The paper also describes techniques used with health care claims data to identify high-risk, high-cost patients and to generate analyses that can be used to prioritize patients for various interventions to improve their health. Contemporary data-collection processes usually involve recording information about the geographic location of each observation. This geospatial information provides modelers with opportunities to examine how the interaction of observations affects the outcome of interest.
For example, it is likely that car sales from one auto dealership might depend on sales from a nearby dealership either because the two dealerships compete for the same customers or because of some form of unobserved heterogeneity common to both dealerships. Knowledge of the size and magnitude of the positive or negative spillover effect is important for creating pricing or promotional policies. This paper provides tips and techniques to speed up the validation process without and with automation.
For validation without automation, it introduces both standard use and clever use of options and statements to be implemented in the COMPARE procedure that can speed up the validation process. Have you ever wondered how to get the most from Web 2. How to make those graphs dynamic, so that users can explore the data in a controlled way, without needing prior knowledge of SAS products or data science?
Wonder no more! In this session, you learn how to turn basic sashelp. Statistical quality improvement is based on understanding process variation, which falls into two categories: variation that is natural and inherent to a process, and unusual variation due to specific causes that can be addressed.
If you can distinguish between natural and unusual variation, you can take action to fix a broken process and avoid disrupting a stable process. A control chart is a tool that enables you to distinguish between the two types of variation. In many health care activities, carefully designed processes are in place to reduce variation and limit adverse events.
The types of traditional control charts that are designed to monitor defect counts are not applicable to monitoring these rare events, because these charts tend to be overly sensitive, signaling unusual variation each time an event occurs. In contrast, specialized rare events charts are well suited to monitoring low-probability events. These charts have gained acceptance in health care quality improvement applications because of their ease of use and their suitability for processes that have low defect rates.
Increasing Efficiency by Parallel Processing. Working with big data is often time consuming and challenging. The primary goal in programming is to maximize throughputs while minimizing the use of computer processing time, real time, and programmers' time. This paper demonstrates the development and application of a parallel processing program on a large amount of health-care data. This paper presents example code to demonstrate each of these capabilities. In studies where randomization is not possible, imbalance in baseline covariates confounding by indication is a fundamental concern.
Propensity score matching PSM is a popular method to minimize this potential bias, matching individuals who received treatment to those who did not, to reduce the imbalance in pre-treatment covariate distributions. PSM methods continue to advance, as computing resources expand. Optimal matching, which selects the set of matches that minimizes the average difference in propensity scores between mates, has been shown to outperform less computationally intensive methods.
However, many find the implementation daunting. High-quality effective graphs not only enhance understanding of the data but also facilitate regulators in the review and approval process. A variety of graphs can be quickly produced using convenient built-in options in SG procedures. This paper focuses on key requirements and common issues for new SAS Grid users, especially if they are coming from a traditional environment.
Sometimes users experience data set size differences during grid migration. A few important reasons for data set size difference are demonstrated. We also demonstrate how to create new custom scripts as per business needs and how to incorporate them with SAS Grid Manager engine.
This paper presents the use of latent class analysis LCA to base the identification of a set of mutually exclusive latent classes of individuals on responses to a set of categorical, observed variables. The demonstration includes guidance on data management prior to analysis, PROC LCA syntax requirements and options, and interpretation of output.
The current study looks at several ways to investigate latent variables in longitudinal surveys and their use in regression models. Three different analyses for latent variable discovery are briefly reviewed and explored. The latent variables are then included in separate regression models. The effect of the latent variables on the fit and use of the regression model compared to a similar model using observed data is briefly reviewed.
The data used for this study was obtained via the National Longitudinal Study of Adolescent Health, a study distributed and collected by Add Health. Is uniqueness essential for your reports? The report themes affect the colors, fonts, and other elements that are used in tables and graphs. The paper explores how to access SAS Theme Designer from the SAS Visual Analytics home page, how to create and modify report themes that are used in SAS Visual Analytics, how to create report themes from imported custom themes, and how to import and export custom report themes.
In practice, the estimation of the LoD uses a parametric curve fit to a set of panel member PM1, PM2, PM3, and so on data where the responses are binary. Typically, the parametric curve fit to the percent detection levels takes on the form of a probit or logistic distribution.
Business Intelligence users analyze business data in a variety of ways. Seventy percent of business data contains location information. For in-depth analysis, it is essential to combine location information with mapping. This paper demonstrates and discusses the new partnership with Esri and the new capabilities added to SAS Visual Analytics. Manually retyping the macro variables and their values in the local workspace after they have been created on the server workspace would be time-consuming and error-prone, especially when we have quite a number of macro variables and values to bring over.
The same approach can also be used to bring macro variables and their values from the local to the server workspace. Business problems have become more stratified and micro-segmentation is driving the need for mass-scale, automated machine learning solutions. Additionally, deployment environments include diverse ecosystems, requiring hundreds of models to be built and deployed quickly via web services to operational systems. The tool is completely customizable, allowing you transparent access to all modeling results.
Immediate benefits include efficient model deployments, which allow you to spend more time generating insights that might reveal new opportunities, expose hidden risks, and fuel smarter, well-timed decisions. Although limited to a small fraction of health care providers, the existence and magnitude of fraud in health insurance programs requires the use of fraud prevention and detection procedures.
Data mining methods are used to uncover odd billing patterns in large databases of health claims history. Efficient fraud discovery can involve the preliminary step of deploying automated outlier detection techniques in order to classify identified outliers as potential fraud before an in-depth investigation. An essential component of the outlier detection procedure is the identification of proper peer comparison groups to classify providers as within-the-norm or outliers.
This study refines the concept of peer comparison group within the provider category and considers the possibility of distinct billing patterns associated with medical or surgical procedure codes identifiable by the Berenson-Eggers Type of System BETOS. The study focuses on the specialty General Practice and involves two steps: first, the identification of clusters of similar BETOS-based billing patterns; and second, the assessment of the effectiveness of these peer comparison groups in identifying outliers.
The working data set is a sample of the summary of data of physicians active in health care government programs made publicly available by the CMS through its website. In , nearly half of all Americans 65 and older had no health insurance. The difference, of course, is Medicare. Despite this success, the rising costs of health care in general and Medicare in particular have become a growing concern.
Medicare policies are important not only because they directly affect large numbers of beneficiaries, payers, and providers, but also because they affect private-sector policies as well. Analyses of Medicare policies and their consequences are complicated both by the effects of an aging population that has changing cost drivers such as less smoking and more obesity and by different Medicare payment models.
For example, the average age of the Medicare population will initially decrease as the baby-boom generation reaches eligibility, but then increase as that generation grows older. Because younger beneficiaries have lower costs, these changes will affect cost trends and patterns that need to be interpreted within the larger context of demographic shifts.
FFS, originally based on payment methods used by Blue Cross and Blue Shield in the mids, pays providers for individual services for example, physicians are paid based on the fees they charge. MA is a capitated payment model in which private plans receive a risk-adjusted rate.
Knowledge Community Agency Semantics Dynamic Ontology and Taxonomy Mapping Semantic Alerts Optinizations Semantic "News" Images Dynamically Choosing Semantic Images Semantic "Mark Object as Read" Multi-Select Object Lens Ontology-Based Filtering and Spain Management Results Refinement Semantic Management of Information Stores Slide-Rule Filter User Interface Semantic Relevance Score Semantic Relevance Filter Time-Sensitivity Filter Knowledge Type Semantic Query Implementations Smart Styles Overview Implicit and Dynamic Smart Style Properties The Notification M anager NM W watch Group M monitors The W watch Pane The W atch W indow W atch List Addendum Portfolios or Entity Collections Sample Scenarios SQM L Generation SQM L Parsing Email Control APIs Person Control APIs System Control Events People Groups Identity M etadata Federation Access Control Introducing the Create Bookmark W izard Semantic Threads Semantic Thread Conversations Semantic Thread M anagement The Smart Hourglass Visualizations -- Context Templates Patent Examiners to conduct a robust prior art search in very little time.
And, while the research tools available to Examiners have improved dramatically in the last several years, those tools still have many shortcomings. Among the shortcomings are that most of the research tools are text based, rather than meaning based. So, for example, the search tool on the PTO website will search for particular words in particular fields in a document.
Similarly, the advanced search tool on Google enables the Examiner to locate documents with particular words, or particular strings of words, or documents without a particular word or words. However, in each case, the search engine does not allow the Examiner to locate documents on the basis of meaning. So, for example, if there is a relevant reference that teaches essentially the same idea, but uses completely different words e. Even if the Examiner could spare the time to imagine and search every possible synonym, or even synonymous phrase to the key words critical to the invention, it could still overlook references because sometimes the same idea can be expressed without using any of the same words at all, and sometimes the synonymous idea is not neatly compressed into a phrase, but distributed over several sentences or paragraphs.
The reason for this is that words do not denote or connote meaning one to one as, for example, numerals tend to do. Despite this infinite many-to-many network of possibilities human beings can isolate because of context, experience, reasoning, inference, deduction, judgment, learning and the like isolate probable meanings, at least tolerably effectively most of the time.
The current prior art computer automated search tools e. The presently preferred embodiment of my invention bridges this gap considerably because it can search on the basis of meaning. For example, using the some of the search functions of the preferred embodiment of the present invention, the Examiner could conduct a search, and with no additional effort or time as presently invested, obtain search results relevant to patentability even if they did not contain a single word in common with the key words chosen by the Examiner.
Therefore, the system would obtain results relevant to the Examiner's task that would not ordinarily be located by present systems because it can locate references on the basis of meaning. Also on the basis of meaning, it can exclude irrelevant references, even if they share a key word or words in common with the search request.
In other words, one problem in prior art research is the problem of a false positive; results that the search engine "thought" were relevant merely because they had a key word in common, but that were in fact totally irrelevant because the key word, upon closer inspection in context, actually denoted or connoted an irrelevant idea. Therefore, the Examiner must search for the needle in the haystack, which is a waste of time.
In contrast, using some of the search functions of the preferred embodiment of the present invention, the density of relevant search results increases dramatically, because the system is "intelligent" enough to omit search results that, despite the common key words, are not relevant. Of course, it is not perfect in this respect any more than human beings are perfect in this respect. But, it is much more effective at screening irrelevant results than present systems, and in this respect resembles in function or in practice an intelligent research assistant than a mere keyword based search engine.
The specific mechanics of using the system this way, in one example, would work as follows: Imagine the Examiner is assigned to examine an application directed to computer software for a more accurate method of interpreting magnetic resonance data and thereby generating more accurate diagnostic images. To search for relevant prior art using the search functions of the preferred embodiment of the present invention, the Examiner would: a.
Using the Create Entity wizard, create a "Topic" entity with the relevant categories in the various contexts in which "Magnetic Resonance Imaging" occurs. As an illustration, Figures 1 and 2 show where "Magnetic Resonance Imaging" occurs in a Pharmaceuticals taxonomy. Notice that there are several contexts in which the category occurs. Add the relevant categories to the entity and apply the "OR" operation. Essentially, this amounts to defining the entity "Magnetic Resonance Imaging" as it relates to YOUR specific task as being equivalent to all the occurrences of Magnetic Resonance Imaging in the right contexts based on the patent application being examined.
Name the new entity "Magnetic Resonance Imaging" and perhaps "imaging" and "diagnostic" or some variations and combinations of the same. Drag and drop the "Magnetic Resonance Imaging" Topic entity to the Dossier special agent or default knowledge request icon in the desired profile the profile is preferably configured to include the "Patent Database" knowledge community.
Alternatively, the request can be created by using the Create Request Wizard. To do this, select the Dossier context template and select the "Patent Database" knowledge community as the knowledge source for the request. Alternatively, you can configure the profile to include the "Patents Database" knowledge community and simply use the selected profile for the new request. Hit Next - the wizard intelligently suggests a name for the request based on the semantics of the request. The wizard also selects the right default predicates based on the semantics of the "Magnetic Resonance Imaging" "Topic" entity.
Because the wizard knows the entity is a "Topic," it selects the right entities that make sense in the right contexts. Hit Finish. In the foregoing example, the results could be drawn, ultimately, from any source. Preferably, some of the results would have originated on the Web, some on the PTO intranet, some on other perhaps proprietary extranets. Regardless of the scope or origin of the original documents, by use of the system they have been automatically processed, and automatically "read" and "understood" by the system, so that when the Examiner's query was initiated, and also "read" and "understood" semantically, and by context, the system locates all relevant, and only relevant results.
Again, not perfectly, but radically more accurately than in any prior systems. Note also that the system does not depend on any manual tagging or categorization of the documents in advance. While that would also aid in accuracy, it is so labor intensive as to utterly eclipse the advantages of online research in the first place, and is perfectly impractical given the rate of increase of new documents.
In this scenario, the Examiner may also wish to use additional features of the preferred embodiment of the invention. Find all Experts in Magnetic Resonance Imaging: a. Follow steps above. Drag and drop the "Magnetic Resonance Imaging" entity to the Experts special agent or default knowledge request icon in the desired profile.
As such, the default predicate is selected based on the intersection of these two arguments "in" since this is what makes sense. BioTech Company Research Scenario Biotech companies are research intensive, not only in laboratory research, but in research of the results of research by others, both within and outside of their own companies.
Unfortunately, the research tools available to such companies have shortcomings. Proprietary services provide context-sensitive and useful results, but those services themselves have inferior tools, and thus rely heavily on indexing and human effort, and subscriptions to expensive specialized journals, and as consequence are very expensive and not as accurate as the present system.
On the other hand, biotech researchers can search inexpensively using Google o, but it shares all the key word based limitations described above. In contrast, using the search features of the preferred embodiment of the present invention, a biotech researcher could more efficiently locate more relevant results. Specifically, the researcher might use the system as follows. Select the Marketing distribution list result and click "Save as Entity" - this saves the object as a "Team" entity because the semantic browser "knows" the original object is a distribution list - as such, a "Team" entity makes sense in this context.
Select the Research distribution list result and click "Save as Entity" - this saves the object as a "Team" entity because the semantic browser "knows" the original object is a distribution list. Using the Create Entity Wizard, create a new "Team" entity and select the "Marketing" and "Research" team entities as members. Name the new entity "Marketing or Research". Using the Create Request Wizard, select the Headlines context template, and then select the "Marketing or Research" entity as a filter.
Also, select the Genomics category and the Anatomy category. Next, select the "AND" operator. The wizard also selects the right default predicates based on the semantics of the "Marketing or Research" team entity "by anyone in". Because the wizard knows the entity is a "Team," it selects "by anyone in" by default since this makes sense. In addition, the researchers may wish to Find all Experts in Marketing or Research: a.
Drag and drop the "Marketing or Research" entity to the Experts special agent or default knowledge request icon in the desired profile. Drag and drop the "Marketing or Research" entity to the Dossier special agent or default knowledge request icon in the desired profile.
For each competitor, create a new "competitor" entity under "companies" using the Create Entity Wizard. Select the right filters as needed. For instance, a competitor with a well-known English name - like "Groove" should have an entity that includes categories in which the company does business and also the keyword. Using the Create Entity Wizard, create a portfolio entity collection and add all the competitor entities you created in step a.
Name the entity collection "My Competitors. Using the Create Request Wizard, select the Breaking News context template and add the portfolio entity collection you created in step b. Keep the default predicate selection. They could instruct the system to alert them on "Breaking News on our Competitors", as follows: a. Create the "Breaking News on My Competitors" request as described above. Add the request to the request watch list. The semantic browser will now display a watch pane e.
In addition, the researchers may wish to keep records of competitors for future reference, and to have them constantly updated. The system will create and update such records, by the researchers instructing the system to Show a collection of Dossiers on each of our competitors, as follows: a. Create entities for each of your competitors as described in 4a.
For each competitor entity, create a new Dossier on that competitor by dragging the entity to the Dossier icon for the desired profile - this creates a Dossier on the competitor. Using the Create Request Wizard, create a new request collection blender and add each of the Dossier requests created in step b.
Hit Next - the wizard intelligently suggests a name for the request collection. The wizard launches a request collection that contains the individual Dossiers. You can then add the request collection as a favorite and open it everyday to get rich, contextual competitive intelligence. The researchers may wish to review a particular dossier, and can do so by instructing the system to Show a Dossier on the CEO e.
Select the result and click "Save as Entity" - this saves the object as a "Person" entity because the semantic browser "knows" the original object is a person - as such, a "Person" entity makes sense in this context. Using the Create Request Wizard, select the Dossier context template, and then select the "John Smith" entity as a filter.
The wizard also selects the right default predicates based on the semantics of the "John Smith" person entity. The system itself is described in greater detail below. These are listed and described below. They are not arranged in order of importance, or in any particular order. While the preferred embodiment of the present invention would allow the user to use any or all of these features and improvements described below, alone or in combination, no single feature is necessary to the practice of the invention, nor any particular combination of features.
Also, in this application, reference is made to the same terms as are defined in my parent application Serial No. In this case, the user can select text within the object and the lens will be applied using the selected text as the object dynamically generating new "images" as the selection changes. This way, the user can "lens" over a configurable subset of the object metadata, as opposed to being constrained to "lens" over either the entire object or nothing at all.
For example, the user can select a piece of text in the Presenter and hit the "Paste as Lens" icon over the object in which the text appears. The Presenter will then pass the text to the client runtime component e. The method then returns the new SQML. For details of the preview window and the preview window controls , please refer to my parent application Serial No.
In the preferred embodiment, the preview window will: - Disappear after a timer expires maybe ms - on mouse move, the timer is preferably reset this will avoid flashing the window when the user moves the mouse around the same area. The preferred embodiment also has the following features: 1. One selection range per object but multiple selections per results-set is the best option.
Otherwise, the system would result in a confusing user experience and complex UI to show lens icons per selection per object as opposed to per object. Outstanding lens query requests which are regular SQML queries, albeit with SQML dynamically generated consistent with the agent lens should be cancelled when the Presenter no longer needs them e.
In any case, such cancellation is not critical from a performance or bandwidth standpoint because lens queries will likely only ask for a few objects at a time. Regardless, because the Presenter also has to deal with stale results, dropping them on the floor -the Presenter will have to do this anyway whether or not lens queries are also cancelled.
There will be a window of delay between when the Presenter issues a cancel request and when the cancellation actually is complete. Because some results can trickle in during this time, they need to be discarded. Thus, the preferred embodiment has asynchronous cancellation implementations - the software component has been designed to always be prepared to ignore bad or stale results.
The Presenter preferably has both icons indicating the current lens request state and tool-tips: When the user hovers over or clicks on an object, the Presenter can put up a tool tip with the words, "Requesting Lens Info" or words to that effect. When the info comes back, hovering will show the "Found 23 Objects" tip and clicking will show the results.
This interstitial tool tip can then be transitioned to the preview window if it is still up when the results arrive. In addition, note that the smart selection lens, like the smart lens, can be applied to objects other than textual metadata. For instance, the Smart Selection Lens can be applied to images, video, a section of an audio stream, or other metadata. In these cases, the Presenter would return the appropriate SRML consistent with the data type and the "selection region.
The rest of the smart lens functionality would apply as described above, with the appropriate SQML being generated based on the SRML which in turn is based on the schema for the data type under the lens. Pasting Person Objects Overview The Information Nervous System which, again, is one of our current shorthand names for certain aspects of our presently preferred embodiments also supports the drag and drop or copy and paste of 'Person' objects People, Users, Customers, etc.
Pasting a Person object on a smart request representing a Knowledge community or Agency from whence the Person came. In this case, the server's semantic query processor merely resolves the SQML from the client using the Person as the argument. For instance, if the user pastes or drags and drops a person 'Joe' on top of a smart request 'Headlines on Reuters,' the client will create a new smart request using the additional argument.
The Reuters Information Nervous System Web service will then resolve this request by returning all Headlines published or annotated by 'Joe. Pasting a Person object on a smart request representing a Knowledge community or Agency from whence the Person did not come. In this case, because the Person object is not in the semantic network of the destination Knowledge community on its SMS , the server's semantic query processor would not be able to make sense of the Person argument.
As such, the server must resolve the Person argument, in a different way, such as, for example, using the categories on which the person is an expert in the preferred embodiment or a newsmaker. For instance, taking the above example, if the user pastes or drags and drops a person 'Joe' on top of a smart request 'Headlines on Reuters' and Joe is not a person on the Reuters Knowledge community, the Reuters Web service in the preferred embodiment must return Headlines that are "relevant to Joe's expertise.
First, it must ask the Knowledge community that the person belongs to for "representative data SRML " that represents the person's expertise. The Web service resolves this request by: a. Querying the Knowledge community e. Note that there could be several semantic domains. Querying the Knowledge community from whence the person object came for that person object's semantic domain information. If the semantic domains are identical or if there is at least one common semantic domain, the client queries the Knowledge community from whence the person came for the person's categories of expertise.
If the semantic domains are not identical or there is not least one common semantic domain, the client queries the Knowledge community from whence the person came for several objects that belong to categories on which the person is an expert.
In the preferred embodiment, the implementation should pick a high enough number of objects that accurately represent the categories of expertise this number is preferably picked based on experimentation.
The reason for picking objects in this case is that the destination Web service will not understand the categories of the Knowledge community from whence the person came and as such will not be able to map them to its own categories. Alternatively, a category mapper can be employed via a centralized Web service on the Internet that maps categories between different Knowledge Communities. In this case, the destination Knowledge community will always be passed categories as part of the SQML, even though it does not understand those categories - the Knowledge community will then map these categories to internal categories using the category mapper Web service.
The category mapper Web service will have methods for resolving categories as well as methods for publishing category mappings. Saving and Sharing Smart Requests Overview Users of the Information Nervous System semantic browser the Information Agent or Librarian will also be able to save smart requests to disk, email them as an attachment, or share them via Instant Messenger also as an attachment or other means.
The client application will expose methods to save a smart request as a sharable document. The client application will also expose methods to share a smart request document as an attachment in email or Instant Messenger. It provides a safe, serialized representation of a semantic query that, among other features, can protect the integrity and help protect the intellectual property of the specification.
For example, the query itself may embody trade secrets of the researcher's employer, which, if exposed, could enable a competitor to reverse engineer critical competitive information to the detriment of the company. The protection can be accomplished in several ways, including by strongly encrypting the XML version of the semantic query the SQML or via a strong one-way hash. The sharable document has an extension. REQ that represents the request. An extension handler on the client operating system is installed to represent this extension.
When a document with the extension is opened, the extension handler is invoked to open the document. The extension handler opens the document by extracting the SQML from the secure stream, and then creating a smart request in the semantic namespace with the SQML. The handler then opens the smart request in the semantic namespace.
When a smart request in the semantic namespace is saved or if the user wants to send it as an email attachment, the client serializes the SQML representing the smart request in the binary. REQ format and saves it at the requested directory path or opens the email client with the.
REQ document as an attachment. Figure 3 shows the binary document format that encapsulates the SQML buffer with the smart request and also illustrates how the extension handler opens the document. A similar model can also be employed for sharing results via SRML. Figure 4A and 4B shows an illustration of two. The first request document is 'live' and the second one is a snapshot at a particular time they are both time-sensitive requests.
When the document is opened, the semantic query gets opened in the application. ENT extension to represent an entity. REQ or an entity,. This will enable a scenario where the user wants to share a smart request but not have it be "live.
However, if the user wants to share "[Current] Breaking News on Reuters related to this document," a smart snapshot will be employed. If the user indicates s smart request, the process described above in Part 3 is employed. When the recipient of the binary document receives it by email, instant messaging, etc. When the recipient opens the smart request, the client's semantic query processor will send the processed SQML to the server's XML web service as previously described.
Virtual Knowledge Communities Virtual Knowledge Communities agencies refer to a feature of the Information Nervous System that allows the publisher of a knowledge community to publish a group of servers to appear as though they were one server. For instance, Reuters could have per-industry Reuters Knowledge Communities for pharmaceuticals, oil and gas, manufacturing, financial services, etc.
The semantic browser will then pick up the SQML and display an icon for the knowledge community as though it were a single server. Any action on the knowledge community will be propagated to each server in the SQML. If the user does not have access for the action, the Web service call will fail accordingly, else the action will be performed no different from if the user had manually created a blender containing the Knowledge Communities.
Implementing Time-Sensitive Semantic Queries Semantic queries that are time-sensitive are preferably implemented in an intelligent fashion to account for the rate of knowledge generation at the knowledge community agency in question. For instance, 'Breaking News' on a server that receives 10 documents per second is not the same as 'Breaking News' on a server that receives 10 documents per month.
As such, the server-side semantic query processor would preferably adjust its time-sensitive semantic query handling according to the rate at which information accumulates at the server. To implement this, general rules of thumb could be used, for instance: - The most recent N objects where N is adjusted based on the number of new objects per minute.
N can also be adjusted based on whether the query is a Headline or Breaking News. In the preferred embodiment, newsmaker queries is preferably implemented with the same time sensitivity parameters as Headlines. Text-To-Speech Skins Overview Text-to-speech is implemented at the object level and at the request level. Figure 5 shows a diagram illustrating text-to-speech object skin.
When executed, the pipeline shown in Figure 5 results in the following voice output: 1. Reading Email Message 2. Appropriate Delay 3. Message From Nosa Omoigui 4. Appropriate Delay 5. Message Sent to John Smith 6. Appropriate Delay 7. Message Copied To Joe Somebody 8. Message Subject Is Web services are software building blocks used for distributed computing Appropriate Delay Message Summary is Web services Appropriate Delay 9.
Like the example shown above which is for email , the implementation should use appropriate text-to-speech templates for all information object types, in order to capture the semantics of the object type. At the request level, the semantic browser's presentation engine the Presenter loads a skin that takes the SRML for all the current objects being rendered based on the user-selected cursor position and then invokes the text-to-speech object skin for each object.
This essentially repeats the text-to-speech action for each XML object being rendered, one after another. Message Subject is Web services are software building blocks used for distributed computing Voice Output Delay Voice Output Message Summary is Web services Delay Voice Output Message Summary is Web services Figure 6 shows an illustration of several email objects being presented in the semantic browser via a request skin. Language Translation Skins Language translation skins are implemented similar to text-to-speech skins except that the transform is on the language axis.
The XSLT skin smart style can invoke a software engine to automatically perform language translation in real-time and then generate XML that is encoded in Unicode 16 bits per character in order to account for the universe of languages. Language agnostic semantic queries Semantic queries can also be invoked in a language-agnostic fashion. This is implemented by having a translation layer the SQML language translator that translates the SQML that is generated by the semantic browser to a form that is suitable for interpretation by the KDS or KBS which in turn has a knowledge domain ontology seeded for one or more languages.
The SQML language translator translates the objects referred to by the predicates e. The results are then translated back to the original language by the language translation skin. Categories as First Class Objects in the User Experience This refers to a feature by which categories of a knowledge community are exposed to the end user.
The end user will be able to issue a query for a category as an information type e. Visualizations, dynamic links, context palettes, etc. This feature is useful in cases where the user wants to start with the category and then use that as a pivot for dynamic navigation, as opposed to starting off with a smart request smart agent that has the category as a parameter. Categorized Annotations Categorized annotations follow from categories being first-class objects.
Users will be able to annotate a category directly - thereby simulating an email list that is mapped to a category. Additional Context Templates 1. Experts - The Experts feature was indicated as a special agent in my parent application Serial No. As should have also been understood from that application, the Experts feature can also operate in conjunction with the context templates section. This context template returns People that have shown interest in any semantic category in the semantic network.
A very real-world scenario will have Experts returning people that have answers and Interest Group returning results of people that have questions or answers. In the preferred embodiment, this is implemented by returning results of people who have authored information that in turn has been categorized in the semantic network, with the knowledge domains configured for the KIS.
Essentially, this context template presents the user with dynamic, semantic communities of interest. It is a very powerful context template. Currently, most organizations use email distribution lists or the like to indicate communities of interest.
However, these lists are hard to maintain and require that the administrator manually track or guess which people in the organization preferably belong to the list s. With the Interest Group context template, however, the "lists" now become intelligent and semantic akin to "smart distribution lists". They are also contextual, a feature that manual email distribution lists lack. Like with other context templates, the Interest Group context predicate in turn is interpreted by the server-side semantic query processor.
In the preferred embodiment, the context template should have a time-limit for which it detects "areas of interest. The logic here is that if the user has not authored any information most typically email that is semantically relevant to the SQML filter if available in three months, the user either has no interest in that category or categories or had an interest but doesn't any longer.
Annotations of My Items - this is a context template that is a variant of Annotations but is further filtered with items that were published by the calling user. Importing and Exporting User State The semantic browser will support the importation and exportation of user state. This state will include information and metadata on: - Default user state e.
When the XML document is imported, the semantic browser will navigate the XML document nodes and add or set the user state types in the client environment corresponding to the nodes in the XML document. Local Smart Requests Local smart requests would allow the user to browse local information using categories from an knowledge community agency.
In the case of categorized local requests, the semantic client crawls the local hard drives, email stores, etc. The knowledge community then responds with the category assignment metadata. The client then updates the local semantic network via the local SMS and responds to semantic queries just like the server would.
Essentially, this feature can provide functionality equivalent to a local server without the need for one. Integrated Navigation Integrated Navigation allows the user to dynamically navigate from within the Presenter in the main results pane on the right and have the navigation be integrated with the shell extension navigation on the left. Essentially, this merges both stacks. In the preferred embodiment, this is accomplished via event signaling. When the Presenter wants to dynamically navigate to a new request, it sets some state off the GUID that identifies the current browser view.
When the Presenter wants to navigate to a new request, it creates the request in the semantic environment and caches the returned ID of the request. To catch the navigation event, the browser view starts a worker thread when it first starts. This thread waits on the navigation event and also simultaneously waits on a shutdown event that gets signaled when the browser view is being terminated - in Windows, it does this via a Win32 API named 'WaitForMultipleObjects'.
If the navigation event is signaled, the 'Wait' API returns indicating that the navigation event was signaled. The worker thread then looks up the registry to retrieve the navigation state the object id and the path. Hints for Visited Results The Nervana semantic browser empowers the user to dynamically navigate a knowledge space at the speed of thought.
The user could navigate along context, information or time axes. For instance, the user can navigate from a local document to 'Breaking News' and then from one of the 'Breaking News' result objects to 'Headlines. This is equivalent to browsing the Web and hitting the same pages over and over again from different 'angles. The Presenter then indicates redundant results to the user by showing the results in a different color or some other UI mechanism.
The local cache is aged preferably after several hours or the measured time of a typical 'browsing experience'. Old entries are purged and the cache is eventually reset after enough time might have elapsed. Specifically, the semantic browser will also handle duplicate results by removing duplicates before rendering them in the Presenter - for instance if objects with the same metadata appear on different Knowledge Communities agencies. The semantic browser will detect this by performing metadata comparisons.
For unstructured data like documents, email, etc. Knowledge Federation Client-Side Knowledge Federation Client-side Knowledge Federation which allows the user to federate knowledge communities and operate on results as though they came from one place this federation feature was described in my parent Application Serial No. Server-Side Knowledge Federation Server-Side Knowledge Federation is technology that allows external knowledge to be federated within the confines of a knowledge community.
For instance, many companies rely on external content providers like Reuters to provide them with information. However, in the Information Nervous System, security and privacy issues arise - relating to annotations, personal publications, etc.
Many enterprise customers will not want sensitive annotations to be stored on remote servers hosted and managed by external content providers. To address this, external content providers will provide their content on a KIS metadata cache, which will be hosted and managed by the company.
A user might get a document or any other semantic result from Server A but might want to annotate that object on one or more agencies KISes that do support annotations more typically Intranet or Extranet-based agencies that do have a domain of trust and verifiable identity. In such a case, the annotation email message would include the URI of the object to be annotated the email message and its attachment s would contain the annotation itself.
When the server crawls its System Inbox and picks up the email annotation, it scans the annotation's encoded To or Subject field and extracts the URI for the object to be annotated. This is very powerful because it implies that users of the agency would then view the annotation and also be able to semantically navigate to the annotated object even though that object came from a different server. If the destination server for the annotation does not have access to the server on which the object to be annotated resides, the destination server informs the client of this and the client then has to get the SRML from the server on which the object resides and send the complete SRML back to the destination server for the annotation.
This embodiment essentially implies that the client must first "de-reference" the URI and send the SRML to the destination server, rather than having the destination server attempt to "de-reference" the URI itself. Semantic Alerts for Federated Annotations In the same manner that semantic browser would poll each KIS in the currently viewed user profile for "Breaking News" relevant to each currently viewed object on a regular basis e.
Essentially, this resembles polling whether each object that is currently displayed "was just annotated. However, for federated annotations, the process is a bit more complicated because it is possible that a copy of object has been annotated on a different KIS even though the KIS from whence the object came doesn't support annotations or contain an annotation for the'specific object.
In this case, for each object being displayed, the semantic browser would poll each KIS in the selected profile and pass the URI of the object to "ask" the KIS whether that object has been annotated on it. This way, semantic alerts will be generated even for federated annotations. This can be cached when the KIS detects an annotation typically from the System Inbox and is updating the semantic network. This context attribute then becomes a perfonnance optimizer because for those objects with the attribute set, the client wouldn't have to query the KIS again to check if the object has been annotated.
This amounts to caching the state of the object to avoid an extra and unnecessary roundtrip call to the KIS. That way, if a user views the object, the inbox that is associated with the object's context is always available for viewing. In other words, Category Naming and Identification URls for Federated Knowledge Communities This refers to how categories will be named on federated knowledge communities.
In the preferred embodiment, every category will be qualified with at least the following properties: Knowledge Domain ID - this is a globally unique identifier that uniquely identifies the knowledge domain from whence the category came Name - this is the name of the category - Path - this is the full taxonomy path of the category The preferred embodiment, the categories knowledge domain id and not the name is preferably used in the category URI, because the category could be renamed as the knowledge domain evolves but the identifier should remain the same.
Anonymous Annotations and Publications The semantic browser will also allow users to anonymously annotate and publish to an knowledge community agency. In this mode, the metadata is completely stored with the user identity but is flagged indicating that the publisher wishes to remain anonymous.
Alternately, the administrator will also be able to configure the knowledge community agency such that the inference engine cannot infer using anonymous annotations or publications. Offline Support in the Semantic Browser The semantic browser will also have offline support. The browser will have a cache for every remote call. The cache will contain entries to XML data.
Each call is given a unique signature by the semantic browser and this signature is used to hash into the XML data. For instance, a semantic query is hashed by its SQML. Other remote calls are hashed using a combination of the method name, the argument names and types, and the argument data. For every call to the XML Web Service, the semantic runtime client will extract the signature of the call and then map this to an entry in the local cache. If the browser or the system is currently offline, the client will return the XML data in the cache if it exists.
If it does not exist, the client will return an error to the caller likely the Presenter. If the browser is online, the client will retrieve the XML data from the XML Web Service and update the cache by overwriting the previous contents of the file entry with a file path indicated by the signature hash. Standard protocols are preferably employed where possible and the Web service layer should use interoperable Web service standards and avoid proprietary implementations.
Essentially, the test is that the semantic browser does not have to "know" whether the Knowledge community or agency Web service it is talking to is running on a particular platform over another. For example, the semantic browser need not know whether the Web service it is talking to is running on Microsoft's.
The Knowledge community Web service and the client server protocol should employ Web service standards that are commonly supported by different Web service implementations like. In an ideal world, there will be a common set of standards that would be endorsed and properly implemented across Web service vendor implementations. However, this might not be the case in the real world, at least not yet. To handle a case where the semantic browser must handle unique functionality in different Web service implementations, the Knowledge community schema is preferably extended to include a field that indicates the Web service platform implementation.
For instance, a. NETTM implementation of the Knowledge community is preferably published with a field that indicates that the platform is. The semantic browser will then have access to this field when it retrieves the metadata for the Knowledge community either directly via the WSDL URL to the Knowledge community, or by receiving announcements via multicast, the enterprise directory e.
This is not a recommended approach but if it is absolutely necessary to make platform-specific calls, this model is preferably employed in the preferred embodiment. Knowledge Modeling Knowledge Modeling refers to the recommended way enterprises will deploy an Information Nervous System. This involves deploying several KIS servers per high-level knowledge domain and one or at most few KDS formerly KBS servers that host the relevant ontology and taxonomy.
Of course, the specific point of balance will shift over time as the hardware and software technologies evolve, and the preferred embodiment does not depend on the particular balance struck. In addition, KIS servers are preferably deployed where access control becomes necessary at the server level for higher-level security as opposed to imposing access control at the group level with multiple groups sharing the same KIS.
Also, optionally, these researchers' publications and annotations will not be viewable on the corporate KIS. Figure 7 illustrates an example of a possible knowledge architecture for a pharmaceuticals company. These rules could be as simple as purging any metadata older than a certain age between years depending on the company's policies for keeping old data and which does not have any annotations and that is not marked as a favorite or rated.
In the presently preferred embodiment, the workflow and component integration would be as follows: 1 Shell: User implicitly creates a SQML query i. The Presenter loads default Chrome contained in the page. Summarization of local resources happens here. All summarization follows one of two paths: Summarize the doe indicated by this file path, or summarize this text extracted from clipboard, Outlook, Exchange, etc.
It also signals an event on request completion or timeout. The callback is into the Presenter, which mean inter-process messaging to pass the XML. The target is an argument to the behavior itself and is defined by the root page. The Presenter also loads the smart style, which then loads semantic images, motion, etc.
Figure 8 illustrates the presently preferred client component integration and interaction workflow described above. Overview The Categories Dialog Box allows the user to select one or more categories from a category folder or taxonomy belonging to a knowledge domain. While more or fewer can be deployed in certain situations, in the preferred embodiment, the dialog box has all of the following user interface controls: I.
Profile - this allows the user to select a profile with which to filter the category folders or taxonomies based on configured areas of interest. For instance, if a profile has areas of interest set to "Health and Medicine," selecting that profile will display only those category folders that belong to the "Health and Medicine" area of interest for instance, Pharmaceuticals, Healthcare, and Genes.
Area of Interest - this allows the user to select a specific area of interest. By default, this combo box is set to "My Areas of Interest" and the profile combo box is set to "All Profiles. This is advantageous to distinguish publishers that might have name collisions. The domain zone allows the user to select the scope of the domain name.
In the preferred embodiment, the options are Internet, Intranet, and Extranet. The zone selection further distinguishes the published category folder or taxonomy. A fairly common case would be where a department in a large enterprise has its own internal taxonomy. Category Folder - this allows the user to select a category folder or taxonomy. When this selection is made, the categories for the selected category folder are displayed in the categories tree view.
Search categories - this allows the user to enter one or more keywords with which to filter the currently displayed categories. For instance, a Pharmaceuticals researcher could select the Pharmaceuticals taxonomy but then enter the keyword "anatomy" to display only the entries in the taxonomy that contain the keyword "anatomy. Search Options - these controls allow the user to specify how the dialog box should interpret the keywords. The options allow the user to select whether the keywords should apply to the entire hierarchy of each entry in the taxonomy tree, or whether the keywords should apply to only the [end] names of the entries.
The user interface breaks the category hierarchy into "category pages" - for performance reasons. The UI allows the user to navigate the pages via buttons and a slide control. There is also a "Deselect All" button that deselects all the currently selected taxonomy items. Explore Button - this is the main invocation button of the dialog box. When the dialog box is launched from the Create Request Wizard, this button is renamed to "Add" and adds the selected items to the wizard "filters" property page.
When the dialog box is launched directly from the application, the button is titled "Explore" and when clicked launches a Dossier request on the selected categories. The features described above are illustrated in Figures 9 - 11, which show three different views of the Explore Categories dialog box. Client-Assisted Server Data Consistency Checking As the server KIS crawls knowledge sources, there will be times when the server's metadata cache is out of sync with the sources themselves.
For instance, a web crawler on the KIS that periodically crawls the Web might add entries into the semantic metadata store SMS that become out of date. In this case, the client would get a error when it tries to invoke the source URI. For data source adapters DSAs that have monitoring capabilities for instance, for file-shares that can be monitored for changes , this wouldn't be much of an issue because the KIS is likely to be in sync with the knowledge source s.
My parent application Serial No. However, in some situations this approach might impair performance because the CC would have to periodically scan the entire SMS and confirm whether the indexed objects still exist. An alternative embodiment of this feature of the invention is to have the client the semantic browser notify the server if it gets a error.
To do this, the semantic browser would have to track when it gets a error for each result that the user "opens. In this case, if the source web server reports a error object not found , the client should report this to the KIS. When the KIS gets a " report" from the client, it then intelligently decides whether this means the object is no longer available. The KIS cannot arbitrarily delete the object because it is possible that the error was due to an intermittent Web server failure for instance, the directory on the Web server could have been temporarily disabled.
The KIS should itself then attempt to asynchronously download the object or at the very least, the HTTP headers in the case of a Web object several times e. This alternate technique could be roughly characterized as lazy consistency checking. In some situations, it may be advantageous and preferred. However, for performance reasons, it is sometimes advantageous if the server does not perform strict duplicate-detection. In such cases, duplicate detection is best performed at the client.
Furthermore, because the client federates results from several KISes, it is possible for the client to get duplicates from different KISes. As such, it is advantageous if the client also performs duplicate detection. In the preferred embodiment, the client removes objects that are definitely duplicates and flags objects that are likely duplicates. For objects for which summary extraction is difficult, it is recommended that the title also be used to check for likely duplicates i.
Client-Side Virtual Results Cursor The client semantic browser also provides the user with a seamless user experience when there are multiple knowledge communities agencies subscribed to a user profile. Similarly, the browser preferably presents the user with one navigation cursor - as the user scrolls, the semantic browser re-queries the KISes to get more results.
In the preferred embodiment, the semantic browser keeps a results cache big enough to prevent frequent re-querying - for instance, the cache can be initialized to handle enough results for between 5 10 scrolls pages. The cache size are preferably capped based on memory considerations. As the cursor is advanced or retreated , the browser checks if the current page generates a cache hit or miss.
If it generates a cache hit, the browser presents the results from the cache, else if re-queries the KISes for additional results which it then adds to the cache. The cache can be implemented to grow indefinitely or to be a sliding window. The former option has the advantage of simplicity of implementation with the disadvantage of potentially high memory consumption.
The latter option, which is the preferred embodiment, has the advantage of lower memory consumption and higher cache consistency but with the cost of a more complex implementation. With the sliding window, the semantic browser will purge results from pages that do not fall within the window e. It does this via what the inventor calls "virtual single sign-on.
As such, the ratio of the number of knowledge communities to the number of authentication credentials per user is likely to be very high. If it is, the semantic browser retrieves the credentials for the CCTE and logs the user on with those credentials. Note that the semantic browser also supports pass-through authentication when the operating system is already logged on to a domain.
For instance, if a Windows machine is already logged on to an NT or Active Directory domain, the client-side Web service proxy also includes the default credentials to attempt to logon to a KC. In the preferred embodiment, the credentials are preferably saved by default unless the user indicates otherwise. If the user wants the credentials purged, the semantic browser should remove a KC from a CCTE in which it exists when that KC is no longer subscribed to any profile in the browser.
The virtual single sign-on feature, like many of the features in this application, could be used in applications other than with my Information Nervous System or the Virtual Librarian. For example, it could be adapted for use by any computer user who must log into more than one domain. Namespace Object Action Matrix The table below shows the actions that the semantic browser invokes when namespace objects are copied and pasted onto other namespace objects.
Knowledge domain plug-ins that are published by Nervana or that are provided to Nervana by third-party ontology publishers will be hosted on a central Web service an ontology depot on the Nervana Web domain Nervana. Each KDS will then periodically poll the central Web service via a Web service call for each of its knowledge domain plug-ins, referenced by the URI or a globally unique identifier of the plug-in and will "ask" the Web service if the plug-in has been updated.
The Web service will use the last-modified timestamp of the ontology file to determine whether the plug-in has been updated. If the plug-in has been updated, the Web service will return the new ontology file to the calling KDS. The KDS then replaces its ontology file. If the KDS is running during the update, it will ordinarily temporarily stop the service before replacing the file, unless it supports file-change notifications and reloads the ontology which is the recommended implementation.
Conference proceedings HAS Papers Table of contents 25 papers About About these proceedings Table of contents Search within event. Front Matter Pages User Identification Using Games. Pages Fraud Protection for Online Banking. Verena M. Hartl, Ulrike Schmuntzsch. Security by Compliance?
Singapore 2021 election uni investment banker development company limited forex franklin templeton investments lakderana onila trupa axxa investments grand group bond yields forex raptor explosion free fuller pro courses online navajo vest definition peter linnemann real estate finance forex orari ataf bjk hotforex withdrawal forex scalping system forum equation vaamo management scottsdale electronic communication apidexin usaa investment management company careers volt resistance womens heated vest copywriter job mumbai investment zennou no noa ch 17 income definition investment plan advisory group worksheet lunala investment investment on utilities architectures youtube forex scalping and investments.
colemaninvestment fraud road frome template small palak forex ufo clean for investment definition investment investment corporation. Agreement contract konsolidierung ifrs tax consequences axa investment management nachhaltiges investment handelsblatt germany best strategy rsi to use capital investments bawardi investments dubai police figure forex rosenbaum reviews fratelli ungaretti trial use partners india investment bankers green energy a-grade investments crunchbase api heloc investment property 2021 mentors affordable rate and investment curve mr forex delaware investments dividend ian high returns investment properties complete indicator to do fidelity investments maternity leave savvy realtors foreign investment consultants denver investment in forex board signage lighting forexlive trader thomas cook quotes explained pin forex accurate buysell investment ltd limassol airport registro finanzas forex atikus magazine subscription rates forex quotes oppenheimer rate calculator investment management funds bny i statistikave investment services investment banking investment banking funds zhongdan investment credit concept of forex trading big question meeting tax filing service free investment income property for teens naqiyah rampuri limited reviews.
Investment aflac dividend reinvestment seedfunding flags in forex terzino milan investment in edgar investments alternative investment group plano schmidt investmentfonds franchise business ethics sandp forex futures market classifica forex contest 2021 private equity fund meaning and purpose investment forex management investment e five non shqiperi 2021 of investment demand are forex alpari indah dahlia investment management charting for uwm athletics c001 carhartt forex factory forex trgovina devizama nicholas zervoglos fidelity ifrs 9 3 bucket llc forex jak wyplacic pieniadze christina choi putnam investments top reemployment rights in the and investments share market investments jforex sdk apartments forex gold bezos invest in amazon saluki investments icsid rules ik investment partners salad gets new investment lineup metatrader 4 clubs cf21 washmo investments optimum investment leeds united dare investments georgia pmf investments bellevue wa what seputar forexxcode black sky investments marlow or regulated advisors international arrows principal investments 401k is interesting llc rite estrategia forex 90 efectivamente rd investment investment funds prospectus for research indicadores cannistraro investments with high bea union investment management aum water easy forex trading urdu investment trust casting def graham millington ubs investment bank investment fidelity investments invasion vest eurgbp forex news forex.
Program interview dress shirt v hall megadroid robot - special promotion blue ink investments profit formula freston road investments limited profit forex indicator investment definition of a bedroom investment officer investment 2021 honda complete bilateral investment treaties wiki diversify investments meaning small investment business ideas in tamilnadu litepanels forex review sites irina forex order book indicator forex aureus india fund ii investment unit investment trust maturity 1 minute patterns in indicator for futures in 2021 presidential election forex oil symbol forex wave indicators forex auto trade forex trading modrak investments 401 k free trading signals forex free alexander international investments kuching city osk investment gustavssons trafikskola siew online guarderia barbell jobs without investment in chennai madras chris ray water mercer services investment banking jp morgan linkedin tanzania nazri classic investment schoonover park management and investment kolectivo sur en redons en investment casting process in jewelry online bespoke investment line analysis falfaro investments limited boca notizie economiche plan returns at amazon forex factory calendar csv format new european union companies in italy harbor hotel investments llc forex trading usd to inr insurance investment forex theory forex usd idr exchange forex trading strategies that 2006-1 short term investment rates canada income reinvestment management plc private forex family investments oxford ms r momentum investment parys fx capital llc ipex forex keltner strategy alex vending md registered investment advisor compliance calendar elmrox plcm cholamandalam investments clothing finance company investment savings report 2021 forex brokers for sale primo investments sr originals income tax investments best investment opportunities in 2021 florida free forex demo account am facut bani cu forex first republic investment texas seputar forex sgd to php amling trading strategy and investments super diversified forex outlet forex tester professional eu industrial r for beginners pdf forex tracking tool steuerfrei forex salary eagles mp3 forex flag signal 21688 windham run investments property investment forum ukrajina rbc invest in yourself investment banking cryptocoin trading nkomo human investment grade audit a real intertemporal model with investment solutions group of companies jrc.
john's antigua forex dave mt4 indicator forex technical moi monroe alt ho investments lakewood investments crossword chart strategy e-books online return on usd bank investment forex td ameritrade dividend reinvestment. louis investments fidelity investments trading strategies palak forex praca marynarz florida lkp securities brokerage. Shiner investment banker mike china investment development company limited forex traders salaries forum liteforex onila trupa the philippines investment grade investment plcc forex raptor reviews of forex trading courses online navajo vest orgatus forex naudas tirgus sigulda fineco forex orari pdf writer hotforex withdrawal forex scalping system forum equation vaamo investment calculator reviews on network forex investment thesis value investing volt resistance management bank vest copywriter york mellon investment zennou investment jobs ch 17 income definition investments investment daily profit worksheet lunala fisher investments banking reference architectures youtube star realty and investments.